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I.  INTRODUCTION 


Backgro  und 


For  several  years,  Applied  Psychological  Services,  under 
the  sponsorship  of  the  Air  Force  Human  Resources  Laboratory  has 
conducted  research  into  methods  for  measuring  and  increasing  the 
comprehensibility  of  written  text  (particularly  textual  materials  util- 
ized during  the  course  of  Air  Force  technical  training).  The  long 
range  goal  of  this  work  is  to  identify  and  develop  a method  which 
will  facilitate  the  comprehension  of  textual  materials  employed  in 
the  training  situation.  Achievement  of  this  capability  could  be  ex- 
pected to  reduce  training  time  and  costs,  and  to  increase  training 
effectiveness. 


The  efforts  toward  this  long  term  goal  initially  concentrated 
on  identification  of  methods  previously  employed  in  measuring  com- 
prehensibility (Williams,  Siegel,  <\  Burkett,  11)74)  and  on  the  acquisi- 
tion of  data  relative  to  the  questions  of  how,  and  in  what  training  con- 
text, auditory  supplementation  of  written  materials  would  increase 
the  transfer  of  knowledge  (Siegel,  I.autman,  <\  Burkett,  1!'74).  A 
major  conclusion  of  these  efforts  was  that  the  new  methods  for  meas- 
uring comprehensibility  are  required. 

Williams  et  al.  (1H74)  noted  that  a long  list  of  techniques  for 
calculation  of  comprehensibility  measures  had  been  offered  for  con- 
sideration over  the  past  30  years.  A sample  of  those  considered  to 
be  of  principal  interest  is  contained  in  Table  1.  For  convenience. 
Table  1 groups  the  measures  into  three  classes:  structural  complex- 
ity, word  divergency,  and  parts  of  speech.  These  deal  principalB 
with  what  one  might  call  mechanically  oriented  factors.  The\  deal 
with  quantities  of  words,  sentences,  syllables  and  their  rates  of  oc- 
currence, but  are  not  concerned  with  meanings  of  words  or  phrases 
per  se.  They  have  been  in  use  for  some  time,  not  only  because  they 
measure  reading  difficulty  in  some  sense,  but  also  because  they  are 
suitable  for  relatively  easy  calculation  by  hand.  These  measures 
have  been  used  principally  to  determine  the  reading  grade  level  (RGL) 
of  text.  However,  a major  shortcoming  of  these  measures  is  that 
they  fail  to  consider  the  inherent  mental  or  intellectual  load  placed 
upon  the  readers  by  the  text. 
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Table  1 


Author  or 
Developer 


Lively  & Pressey 
Dale  S Tyler 


Selected  lleadability  Measures 


Median  index  no.  of  sampled  words  on 

Thorndike  word  list 

No.  different  technical  terms 


Thorndike 
Gray  & Leary 

Dale  & Chall 
Gillie 

Powers , Sumner 
& Kearl 


No.  different  non-technical  terms 

No.  independent  clauses 

Proportion  of  words  not  on  word  list 

No.  of  words  not  on  Dale  list 

No.  personal  pronouns 

Sentence  length 

Percent  different  words 

No.  prepositional  phrases 

Words  not  on  Dale  list 

Sentence  length 

No.  finite  verbs 

No.  definite  articles 

No.  abstract  nouns 

Word  length  in  syllables 

Sentence  length 


[Revision  of  Flesch] 

Smith  & Senter  Sentence  length 

[ARI|  word  length  (letters) 

Caylor  et  al.  No.  one  syllable  words 

[FORCAST  ] 


x 
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X 

X 


X 
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X 


X 

X 
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As  pointed  out  by  Bormuth  (1966),  until  very  recent  years  no 
theoretical  base  was  available  from  which  to  generate  testable  hypoth- 
eses relating  to  readability.  Powerful  theories  of  language  behavior 
did  not  exist,  so  that  only  the  most  obvious  statistical  characteristics 
of  the  written' text  were  studied. 


This  void  has  begun  to  be  filled  by  modern  linguistic  and  psy- 
chol inguistic  research  and  by  models  of  overall  intellective  function- 
ing such  as  the  Structure-of-lntelleel  of  Guilford  and  his  associates 
(Guilford,  1967;  Guilford,  Cornrey,  Green,  a Christensen,  1950; 
Guilford,  Geiger,  A Christensen,  1954;  Guilford  A lloepfner.  1966; 
Iloepfner,  Guilford,  <\  Merrificld,  1961).  The  readability  factors 
examined  in  the  current  scries  of  studies  are  based  on  implications 
of  the  Structu  re-of-Intcl  iect  model  and  of  experimental  psycholin- 
guist ic  findings. 
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In  earlier  work  m the  present  series,  the  effects  of  psycho- 
linguistically  oriented  and  intellective  function  related  variables  on 
the  comprehensibility  of  text  were  demonstrated,  and  methods  for 
improving  readability  were  defined  (Siegel  i\  Burkett  1 f'T-1 ).  l-'oiir- 
teen  specific  comprehensibility  measures  were  specified,  and  the 
feasibility  of  developing  a computer  technique  to  automate  their 
calculation  was  determined  to  have  only  a modest  associated  risk. 
Siegel  and  Burkett  concluded  that;  ' I he  results  supported  a conten- 
tion favoring  the  potential  of  psycholinguistie  and  intellective  con- 
cepts for  readability/comprehensibility  measurement." 

Purpose  of  the  Present  Program 

The  current  research  and  development  effort  had  as  its  focus 
the  following  objectives; 

1.  formulation  of  specific  procedures  for  measur- 
ing each  of  a set  of  psycholinguistically  and  Strut  - 
ture-of  -Intellect  oriented  variables  for  measuring 
textual  comprehensibility 

2.  selection  of  measures  most  appropriate  for  use  in 
automatic  computation  of  a comprehensibility  index 

3.  experimental  study  of  the  power  and  characteristics 
of  tlu'  comprehensibility  variables 

4.  development  of  specifications  for  a computer  pro- 
gram. called  the  Comprehensibility  Measures  (CM) 
program.  \\  hen  implemented,  CM  would  process 
blocks  of  texts  and  display  the  results  of  the  com- 
prehensibility measurements. 


I tili/.ation  of  the  Technique 


Eventual  application  of  the  automatic  analytic  technique  was 
held  in  mind  throughout  the  present  program.  A summary  list  of 
potential  uses  is  given  below: 


• determine  comprehensibility  of  text 

• isolate  causes  of  low/ high  scores 

by  measure 

• determine  placement  of  a written  work 

by  KG  I. 

• compare  comprehensibility  of  two  or  more 
works 

by  author 
by  time  period 
by  type  of  work 

• compare  comprehensibility  between  portions 
of  a work 

by  paragraph 
by  page 
by  chapter 

• writer  training  (author  diagnostic  assistance) 

• inter-author  discrimination  studies 

• statistical  data  generation 

• sentence  parsing  studies 

• screening  text 

Consider  a writer  who  is  concerned  with  the  comprehensi- 
bility of  his  material  and  who  lias  completed  a written  work  in  draft 
form-.  He  would  like  to  submit  the  material  to  a comprehensibilit\ 
measurement  analysis.  Assume  he  lias  access  to  the  computer  sys- 
tem for  which  the  CM  program  will  have  been  prepared.  This  acres* 
could  lie  either  through  the  submittal  of  computer  run  requests  to  the 
computer  center  or  via  a remote  terminal.  He  would  then; 


• arrange  for  the  text  to  be  prepared  in  ma- 
chine readable  form  (e.  g.  , punched  cards, 
magnetic  tape,  etc.  ) 

• select  values  for  a variety  of  parameters 
and  options  which  describe  the  run  (these 
are  discussed  in  Chapter  V and  in  more 
detail  in  Appendices  A and  B of  the  present 
report.  They  include,  for  example,  the  size 
of  the  text  blocks  into  which  the  total  text  is 
to  be  subdivided  for  measurement.  The  CM 
program  provides  for  determination  of  meas- 
ures for  each  text  block  whose  size  may  be 
specified  in  terms  either  of  a prespecified 
number  of  words  or  identification  of  the  start 
of  prespecified  blocks) 

• request  a computer  run  either;  (1)  to  check 
the  text  against  a currently  available  diction- 
ary, or  (2)  to  calculate  and  display  compre- 
hensibility measures  themselves. 

The  dictionary  check  would  usually  be  performed  first  be- 
cause the  calculation  of  certain  measures  depends  on  a prespecified 
percentage  of  words  in  the  text  actually  appearing  in  the  dictionary. 
If  the  dictionary  check  is  requested,  the  requester  will  be  presented 
with  the  following  types  of  information  as  output  from  CM: 

• a list  of  words  in  the  text  which  do  not  appear 
in  the  dictionary.  These  would  indicate  either 
a misspelling  which  should  be  corrected,  or 
that  additional  entries  should  be  made  to  the 

dictionary 

• the  location  of  the  first  occurrence  of  such 
words  in  the  text 

• the  number  of  occurrences  of  each  of  these 
words  in  the  text 

• percentage  of  words  not  in  dictionary. 
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If  the  request  is  for  a computation  of  measures,  the  following 
will  be  made  available  at  several  levels  of  detail  as  desired; 

• the  value  of  each  of  14  measures  for  each 
block  .of  text 

• the  measured  reading  grade  level  (RGL) 
using  up  to  three  of  the  ’’classical"  (prior) 
methods,  for  comparison  purposes 

• the  value  of  a variety  of  detailed  counts 
used  in  the  computation  of  the  measures 
and  KGLs 

• a block  by  block  summary  of  measure 
values 

• a single  composite  value  of  the  14  meas- 
ures for  each  block 

• averages  over  all  blocks  of  each  measure, 
the  KGLs,  and  the  composite. 

These  data  present  the  text  writer  with  indices  with  which 

he  can; 

1.  compare  the  comprehensibility  of  current 
and  prior  works 

2.  compare  the  comprehensibility  of  individual 
blocks  (or  paragraphs  or  chapters)  with 
each  other 

3.  isolate  the  "cause"  of  low  overall  scores. 
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This  will  then  spec  ify  blocks  for  revision,  if  desired.  The 
process  may  be  repeated  or  continued  for  succeeding  c hapters  or 
sections  of  the  text  as  they  are  prepared,  or  for  revised  portions 
as  they  arc'  rewritten  or  edited.  In  this  way.  the  writer  lias  an  iter- 
ative and.  in  a sense,  a corrective  procedure  at  his  disposal  so  that 
he  may  continue  to  strive  for  comprehensibility  scores  which  are  ac- 
ceptable. 

An  alternate  use  of  the  CM  program  is  to  assist  an  evalua- 
tor in  assessing  the  comprehensibility  of  a report,  manual,  lesson, 
or  other  written  work;  i.  e.  , essentially  a quality  control  function. 
Consider  the  case  in  which  many  such  documents  are  to  be  evalu- 
ated. and  it  is  not  feasible  to  encode  completely  the  entire  text  into 
machine  readable  form.  Then,  the  evaluator  could  sample  several 
blocks  from  the  text  to  be  evaluated.  The  CM  program  would  then 
evaluate  each  block  and  show  block  as  well  as  average  per  block  data 
over  the  entire  sample. 

Scope  of  the  Report 


Siegel  and  Burkett  (1974)  defined  and  investigated  14  compre- 
hensibility measures.  In  Chapter  II  of  the  present  report,  some  of 
these  measures  have  been  redefined  so  as  to  facilitate  their  imple- 
mentation within  a computer  program. 

Chapter  III  describes  the  methods  and  results  of  additional 
experiments  performed  to  gain  insight  into  the  operating  character- 
istics of  the  various  measures. 

Chapter  1\  presents  conclusions  and  recommendations. 

The  various  aspects  and  requirements  for  the  ( M program 
are  contained  in  Chapter  V.  A special  dictionary  is  cited  as  a re- 
quirement for  input  to  this  program,  and  its  features  are  described. 

A description  of  each  module  of  the'  program  is  also  presented,  along 
with  summary  logic  flow  diagrams  and  run  request  information.  (Hit- 
put  results  and  their  formats  are  specified  in  sufficient  detail  to  al- 
low initiation  of  programming. 


The  Appendices  contain  complete  guidelines  for  the  devel- 
opment of  a digital  computer  program  which  will  allow  automatic 
analysis  of  textual  comprehensibility. 
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II.  SELECTION  AND  DEKIMTIO.N  OK  ME  A. SI  HE'S 


The  Structure-of-Intellect  and  the  psycholinguisticallv  ori- 
ented comprehensibility  measures  are  presented  in  Tables  2 and  3, 
which  show  for  each  of  14  measures  the  name,  abbreviation,  and 
associated  formula  used  to  calculate  its  value. 

Guilford's  Structure-of-lntelleci  Model 

The  Structure-of-Intellect  (SI)  model  developed  by  Guilford 
and  his  associates  (Guilford.  l!'67;  Guilford  et  al.  . 1 ! * o 0 . 

] U 6 G ; Hoepfner  et  al.  1T'64)  produced  a hypothetical  construct  as 
to  the  nature  and  structure  of  human  intellective  activity. 

The  SI  model  is  a cross-classification  representation  that 
classifies  intellectual  abilities  along  three  different  dimensions. 

Each  dimension  is  divided  into  categories  which  intersect  with  the 
categories  of  the  other  dimensions  of  ability.  Mental  operations 
represent  one  dimension  of  classification  in  the  Sf  model.  The  five 
mental  operations  are:  (a)  cognition,  (b)  memory,  (c)  divergent  pro- 
duction. (d ) convergent  production,  and  (e)  evaluation. 

The  second  classification  dimension  of  the  SI  model  involves 
the  content  areas  of  information  on  which  the  mental  operations  are 
performed.  These  areas  of  information  include;  (a)  figural.  (b)  sym- 
bolic, (c)  semantic,  and  (d)  behavioral.  Twenty  separate  abilities 
can  be  derived  from  the  combination  (intersection)  of  the  five  catego- 
ries in  the  mental  operation  dimension  and  the  four  categories  in  the 
contents  dimension. 


Table  ^ 
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Sam  mar  s'  Measures 
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tn-'-b 

4 \ 

n:t 
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1 _ TMWB 
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Evaluation  of  Symbolic  Implications 
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W 
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CL 

Center  embedding 
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I,B 
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RB 
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DC 
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Table  2 


Sv mbologv  fm'  Table  2 Formulae 


a text  : 
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!.  • : >1  noun  hrase  t th<  right  f th<  ..  - 

in  a sentence 
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: hrase  ot  a sentence 
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N imi  • ■ l hained  modify ing  clause  n th  Lett 
Total  numl  er  of  aids  t mprehensi  in  a t < xt  . . 


The  final  dimension  of  intellect  in  the  SI  model  concerns  the 
formal  types  of  information  dealt  with.  These  informational  types  or 
products  can  be  units,  classes,  relations,  systems  transformations, 
and  implications.  When  the  six  products  are  combined  with  the  ti\< 
operations  and  with  the  four  contents,  1-0  cells  result.  I he  total  mod- 
el is  composed  of  these  120  abilities  and  can  be  viewed  as  a three-dimen- 
sional cube,  shown  in  Figure  1. 
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I'  i/Hiri'  ].  The*  Structure-of-Intelleet  model 
(from  Guilford  <N  Hoepfner,  1971). 
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Our  hypothesis  is  that  textual  materials  which  require  high 
levels  of  SI  oriented  abilities  for  mastery  can  be  said  to  be  less 
comprehensible  than  materials  which  require  lower  levels  of  these 
abilities.  The  defined  metrics  can  be  applied  to  textual  materials 
to  reflect  the  SI  oriented  abilities  required  to  master  the  materials. 
This  involves  adapting  the  model  to  a comprehensibility  format  such 
that  the  degree  to  which  a particular  reading  selection  is  loaded  in 
various  factors  may  be  quantified.  Since  the  model  contains  120 
cells  (abilities),  a sample  was  required.  To  this  end,  those  eight 
abilities  which  seemed  most  relevant  to  the  comprehensibility  prob- 
lem in  the  Air  Force  technical  training  context  were  selected.  I’hose 
selected  were:  cognition  of  semantic  units,  cognition  of  semantic- 
relations,  memory  of  semantic  units,  memory  of  figural  units,  con- 
vergent production  of  semantic  implications,  convergent  production 
of  semantic  systems,  divergent  production  of  semantic  units,  and 
evaluation  of  symbolic  implications.  Each  of  these  is  discussed 
below;  however,  for  precise  definitions  of  the  variables  used  in  the 
experiments  described  in  Chapter  III,  the  reader  should  refer  to 
Tables  2 and  3. 


Factors  Derived  from  the  SI  M o d e 1 
Involved  in  Comprehensibility 


Cognition  of  Semantic  Units  (CMC) 

CMU  in  the  comprehensibility  context  involves  the  extent  to 
which  the  text  forces  the  reader  to  recognize  a diversity  of  word 
forms.  Thus,  the  rhyme,  "One  little  piggy  went  to  market,  one  lit- 
tle piggy  stayed  home"  is  held  to  be  readable  because  of  the  common 
word  usage.  The  redundancy  of  words  is  held  to  increase  readability. 
The  same  material  written  as  "A  unitary  small  piggy  went  to  market, 
one  little  hog  stayed  home"  is  considered  to  be  less  readable  than 
the  original  text. 

According  to  Guilford  (11)67),  cognition  of  semantic  units  is 
most  directly  measured  by  vocabulary  tests.  In  the  context  of  com- 
prehensibility, an  analogous  mensmv  is  the  ratio  of  the-  number  of 
different  words  (types)  to  the  total  number  of  words  (tokens).  The 
CMl  measure  is  equal  to  the  inverse  of  the  type-token  ratio,  sub- 
tracted from  one.  Measures  of  vocabulary  diversity  are  commonly 
found  in  existing  readability  formulas;  the  effect  of  vocabulary  di- 
versity on  comprehensibility  was  also  demonstrated  by  Siegel  and 
Bergman  (11)7-1). 
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Cognition  of  Semantic  lielaliei!  (<  • 


CM  It  was  defined  a.;  the  ( :t(  :it  • ■ which  tin-  text  forces  the 
reader  to  recognize  the  relationship  bet. '.eon  tv.  o items  or  words. 
Guilford  (1967)  used  analogy  and  word  1:  . a;  > test;  to  measure  this 
factor.  In  word  linkage  tests,  the  test  taker  is  required  to  match 
sets  of  words  in  terms  of  their  relatedno.  ; or  connectedness.  This 
factor  is  believed  to  be  reflected  in  reading  material  when  the  reader 
must  form  analogies  or  word  linkages  while  reading.  One  would  ex- 
pect that  increasing  the  requirement  for  relational  thinking  in  a read- 
ing selection  would  decrease  reading  comprehension.  Thus,  if  as- 
suming that  the  demand  for  relational  thinking  is  a function  of  the 
number  of  common  words  and  references  between  adjacent  sentences, 
the  fewer  shared  common  words  and  references  between  sentences, 
the  more  difficult  the  text. 


Convergent  Production  of  Semantic  Systems  (XMS) 


Guilford  (1967)  describes  convergent  production  of  semantic 
systems  as  the  ability  to  order  or  organize  material.  Siegel  and 
Bergman  (1974)  were  able  to  manipulate  comprehension  by  varying 
organization  of  the  number  of  comprehension  aids  which  appear  per 
standard  block  of  text.  Examples  of  comprehension  aids  are  mne- 
monic devices  or  a figure.  NMS  was  considered  for  investigation 
although  not  included  in  the  CM  program.  NMS  was  calculated  as 
the  number  of  aids  to  comprehension  in  a block  divided  by  four- -an- 
arbitrarily  selected  constant. 

Memory  for  Semantic  Enits  (MME) 

According  to  Guilford  (1967),  memory  for  ideas  is  most  di- 
rectly addressed  by  memory  for  semantic  units.  Siegel  and 
Bergman  (1974)  demonstrated  that  replication  of  facts  increased 
comprehensibility.  The  analogous  measure  involves  the  number 
of  different  nouns  appearing  in  a passage,  divided  by  the  number 
of  words  in  the  passage.  It  was  assumed  that  repeated  or  related 
facts  will  be  presented  using  the  same  noun  as  subject,  allowing 
that  measure  to  serve  as  an  index  of  fact  repetition. 


h'val  licit  ion  of  Symbolic  Implications  (HSI) 


Guilford  ( 1 ! 1 6 7 ) usod  abbreviations  teats  to  measure:  evalu- 
ation of  symbolic  units.  A corresponding  comprehensibility  meas- 
ure involves  the  frequency  of  occurrence  of  abbreviations  in  pas- 
sages of  running  text.  Siegel  and  Bergman  (1974)  demonstrated 
that  frequency  of  occurrence  of  abbreviations  influenced  compre- 
hension. 


Convergent  Production  of  Semantic'  Implications  (\MI) 

Guilford  and  Itoepfner  (1971)  used  syllogisms,  attribute 
listing,  missing  links,  and  sequential  association  tests  to  meas- 
ure convergent  production  of  semantic  implications.  Heading  ma- 
terial loaded  in  convergent  production  theoretically  requires  the 
reader  to  perform  syllogistic  reasoning  tasks.  Material  which  does 
not  require  this  ability  completes  the  syllogism  for  the  reader.  By 
increasing  the  demand  for  convergent  production  of  semantic  im- 
plications a text  should  become  less  comprehensible.  Siegel  and 
Bergman  (1974)  found  that  textual  material  which  imposed  syllo- 
gistic reasoning  on  the  reader  was  less  comprehensible  than  that 
not  involving  such  reasoning. 

Identification  of  syllogisms  by  computer  was  considered  be- 
yond current  capabilities.  Accordingly,  in  the  current  work,  a 
measure  was  employed  in  which  the  number  of  words  in  a passage 
was  divided  by  the  total  of  the  dictionary -listed  numbers  of  parts 
of  speech  of  the  words  of  a textual  sample.  This  was  considered  to 
be  an  index  of  the  number  of  possible  parses  of  a sentence,  and  was 
thus  analogous  to  a measure  of  syllogistic  reasoning.  That  it,  it  is 
assumed  that  a sentence  becomes  more  ambiguous  as  its  number  of 
possible  parses  increases.  The  increased  load  on  the  reader  is  as- 
sumed to  increase  the  syllogistic  load  required  to  decode  the  sen- 
tence. 


Divergent  Production  of  Semantic  Beits  (PMl  ) 

According  to  Guilford  (1967),  DMt  involves  the  ability  to 
enumerate  class  members  given  certain  class  properties.  With 
regard  to  the  comprehensibility  of  training  texts,  D\ll  relates  to 
the  demand  upon  a reader  to  enumerate  class  members  on  his  own 
rather  than  have  the  class  member  supplied  iti  the  reading  selection. 
Siegel  and  Bergman  (1974)  demonstrated  a relationship  between  the 
presentation  of  cxample(s)  and  comprehension. 


P s y c hoi  1 ugu i st  i c Measures 


The  second  set  of  measures  was  based  on  concepts  drawn 
from  the  psycholinguistic  literature.  The  literature  and  prior  stud- 
ies in  this  series  (Siegel  Burkett,  1 1 1 < -1 ) have  suggested  a number 
of  rules  for  making  a sentence  more  comprehensible.  Some  of 
these  suggestions  were  to;  (1)  decrease  word  depth  (Bormuth.  1 ■ 1 6 ' ' ; 
Boss  A.  C’rains.  1 ! ' 7 0 ).  (2)  decrease  morpheme  depth  (Bormuth. 
lpfj'i).  (3)  change  passive  to  active  voice  when  there  is  a possibilit\ 
of  a reversal  of  subject  and  object;  this  reduces  structural  as  well 
as  semantical  problems  (Gough.  lbGfi;  Slobin,  F'Gli;  Fodor,  FBI). 

(4)  avoid  center  embedding  whenever  possible  (Schwartz,  Sparkman. 

* Deese  1 ‘*70;  Wang.  FBO).  (5)  avoid  right-branching  sentences 
\\  l idle  v e r pos s ib l e (Sc hwartz  et  a l . , FBO),  and  (6 ) write  at t i t mat  i v i 
sentences  when  possible  (Gough,  F'65;  Slobin.  F'66). 

Vngve  Depth  (VP) 

Vngve  (1 .060)  developed  a model  of  sentence  production  which 
claimed  that  a person  produces  sentences  by  generating  a ' sentence 
structure  tree"  in  a top  to  bottom-left  to  right  direction.  According- 
ly. at  am  given  time,  a speaker  has  produced  only  that  portion  ot  the 
left-hand  side  of  the  tree  necessary  to  produce  the'  word  spoken.  As 
the  speaker  works  down  the  tree,  lie  produces  both  branches  of  a 
node,  but  must  store  the  right  branch  in  memory  while  expanding 
the  left  branch.  Bormuth  (106b)  found  that  sentence  depth  was  cor- 
related with  Uh>  difficulty  of  a passage.  Martin  and  Koberts  (1066) 
held  sentence  length  constant  and  varied  the  Vngve  depth  of  sentences 
and  found  that  sentences  of  lesser  complexity  were  recalled  signifi 
canllv  more  frequently  than  sentences  of  greater  structural  complex- 
ity. The  finding  that  mean  linguistic  depth  is  a strong  predictor  of 
sentence  comprehensibility  was  replicated  by  Wang  (FBI))  and  by 
I. amber!  and  Siegel  (FBI). 

VI)  was  computed  in  the  present  work  by  dividing  the  total 
number  of  words  in  a textual  block  by  the  product  of  the  number  of 
sentences  in  the  block  and  the  sum  of  the  - enteuces'  7 F . al  u s.  I In 
VD  value  for  a sentence  was  determined  by  parsim.  and  letermimuL 
the  number  of  right  branches  in  the  sentence  which  had  to  be  antici- 
pated by  a reader  or  listener.  I'll  is  Vfil  u th<  n, 
overall  depth  of  the  sentences  in  the  block. 
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Morpheme  Depth  (\IU) 


A morpheme  is  the  meaning  carrying  unit  of  language.  It 
does  not  always  correspond  to  the  syllable.  For  example,  the  word 
'flower'  is  one  morpheme.  Bormuth  (lf)6f)  speculated  that  the 
comprehensibility  of  an  individual  word  could  in-  dependent  on  ho\s 
many  morphemes  are  "buried'  within  it.  For  example  in  the  word 

un/happ i /ness : 

an  morpheme  denoting  "not" 

hap:  i morpheme  denoting  a state  of  mood 

ness  - morpheme  denoting  i condition  or  juality 

A person  reading  this  word  must  have  knowledge  of  the  meaning  of 
all  three  morphemes  in  order  to  comprehend  the  word. 

Lambert  and  Siegel  ( 1 ! ' 74 ) found  the  mean  number  of  mor- 
phemes per  word  to  be  related  to  comprehensibility  . Accordingly 
Ml)  was  defined  as  the  number  of  words  sampled  divided  by  the 
number  of  morphemes  in  the  sampled  words.  As  for  all  of  the 
measures,  low  values  of  this  index  were  expec  ted  to  bo  associated 
with  passages  of  low  comprehensibility. 

Transformational  Complexity  < l(') 

According  to  theories  of  transformational  grammar,  sen- 
tence's of  any  type  or  level  of  complexity  are  produced,  or  inter- 
preted, through  transformations  relative  to  simple,  active  ' ker- 
nels. ' Interpretation  of  passive,  negative*,  or  passive  ne*gative* 
sentences  or  independent  clauses  rcepiires  sueecssively  more*,  or 
mon  elaborate,  transformat  ions  from  the*  basic  active*  kernel. 
Lambert  and  Siegel  ( 1 D 7 4 > found  the  redative  freepieney  of  errors 
of  interpretation  e>f  sentemeu-s  e>t  these  four  classes  to  loilow  the 
o rde r dese' ri bed  above.  I he  present  measure  oi  le  was  based 
on  assignment  of  a score  based  on  the  error  rate  of  inte*rpreta- 
tion  of  sentences  of  tin*  e'oi* respond  in e type*  observer!  by  Lambert 
and  Siegel.  The  average  ot  the.->i  .-a  e»re->  per  sample  of  text  pro- 
vides tile  T<  ' index. 


Center  Kmbedding  (('I.') 


Schwartz  ct  al  (1970)  demonstrated  that  the  inclusion  of 
phrases  or  clauses  between  a sentence's  subject  and  its  predi- 
cate decreased  comprehensibility.  This  finding  was  confirmed 
by  Lambert  and  Siegel  (1974).  To  measure  the  degree  of  center 
embedding  in  a block  of  text,  a measure  was  devised  in  which  the 
number  of  phrases  (prepositional,  adverbial,  etc.  ) and  clauses 
(relative,  adverbial,  verb,  complement,  etc.  ) between  the  sub- 
ject(s)  and  predicate(s)  of  a sample  of  sentences  is  divided  by  the 
number  of  sentences  examined  to  yield  a measure  of  comprehensi- 
bility. 

Hight  Hranching  (KM)  and  Left  Hranching  (US) 

Schwartz  et  al.  (1970)  also  found  that  addition  of  phrases 
or  clauses  to  the  left  end  of  a sentence  (prior  to  the  subject)  de- 
creased comprehensibility.  Addition  of  similar  material  follow- 
ing the  sentence  predicate  did  not  degrade  comprehensibility. 
Lambert  and  Siegel  (1974)  tested  these  variables  and  obtained 
mixed  results.  These  sentence  structural  aspects  are  evaluated 
in  the  present  study  through  measures  which  consider  the  frequen- 
cy of  right  or  left  branching,  respectively,  within  a sampled  group 
of  sentences. 

Complement  Deletion  (DC) 


Certain  surface  structure  features  of  sentences  serve  to 
mark  the  deep,  or  meaning  carrying,  structure  of  sentences.  The 
word  "that"  when  used  as  a complement  as  in  ’’lie  said  that  I should 
go,  " serves  that  function.  Hakes  (1972)  found  the  inclusion  or  de- 
letion of  this  complement  to  affect  comprehensibility.  The  measure, 
in  the  present  study,  which  reflects  this  variable  is  the  ratio  of  the 
number  of  occurrences  of  deleted  complements  tc  the  number  of 
sentences  in  a passage. 


Integration 


These  psycholinguistically  and  SI  oriented  measures  were  se 
lected  for  inclusion  in  the  CM  program  on  the  basis  of  the  conjecture 
that  textual  comprehensibility  is  a function  of  each. 


AH  formulas  have  been  scaled  so  that  high  values  indicate 
improved  comprehensibility  and  lower  values  imply  a poorer  com- 
prehensibility on  a scale  of  U to  1. 

A companion  report  (Williams.  Siegel  Burkett,  <\  Croff.  in 
press)  presents  the  results  of  work  in  which  norms  for  each  meas- 
ure and  for  a variety  of  types  of  documents  were  determined.  For 
completeness,  the  resultant  tables  are  included  here  as  Appendix  (I. 
These  data  were  used  in  the  CM  program  to  enable  the 
of  all  calculated  measures 


II  e a d i n g Cl  rade  Level  s 

As  part  of  the  CM  program,  two  classical  reading  grade  ley  - 
el  (11(11 . ) indices  and  one  readability  index  are  calculated.  The  11(1 1 .s 
estimate  the  school  grade  level  of  the  given  text.  These  are  to  be 
products  of  the  CM  program  since  the  basic  counts  for  these  indices 
will  be  available  from  the  CM  calculation  sequences  or  will  be  readily 
obtained. 

FOliCAST 

The  FOltCAS T formula  was  developed  to  measure  the  read- 
ability of  Army  technical  literature  (Caylor.  Stieht.  Fox  <x  Ford,  luT: 
The  authors  considered  existing  formulas  inappropriate  for  their  pur- 
pose because  school  students  and  school  or  general  lexis  had  been  em- 
ployed in  developing  prior  readability  formulas.  This  type  of  standard 
ization  was  believed  to  make  suspect  the  applicability  of  prior  formula: 
to  technical  publications  for  adults.  Moreover,  application  of  many 
of  the  existing  formulas  requires  special  grammatical  or  linguistic 
competence  on  the  (tart  of  the  person  attempting  to  apply  the  formulas. 

A literature  search  provided  Caylor  et  al.  (1"72)  with  a list 
of  15  structural  properties  of  text  that  hail  been  applied  in  previous 
readability  formulas  and  required  no  special  competence  or  equip- 
ment to  measure.  Correlations  between  elo/e  score  and  each  of  the 
structural  properties  were  computed  and  regression  equations  were 
derived.  Their  preferred  formula  employed  only  a single  factor, 
number  of  one  sy  llable  words  per  passage.  This  factor  is  very  easily 
measured  and  the  addition  of  other  factors  yielded  no  practical  in- 
crease in  predictive  power. 
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The  RORCAST  formula  predicts  reading  grade  level  as 
equal  to; 


20  - (number  of  one  syllable  words/10) 


The  correlation  between  predicted  RGI  of  a passage  with  tested  RCi  I 
associated  with  35  percent  cloze  score  was  .37.  A subsequent  appli- 
cation using  new  test  passages  and  new  subjects  produced  a correla- 
tion of  . 77. 


Automated  Readability  Index  (AR1) 

Smith  and  Sente r ( 1 ?'66 ) developed  a readability  equation 
whose  data  may  be  collected  from  mechanical  counters  easily  in- 
stalled on  an  electric  typewriter.  This  technique  allows  measure- 
ment of  readability  at  essentially  rough  draft  typing  speed.  Me- 
chanical counters  are  used  to  record  the  number  of  key  strokes, 
blank  spaces,  and  sentences  (an  equals  sign  must  be  typed  at  the 
end  of  each  sentence;  the  number  of  activations  of  this  key  indi- 
cates the  number  of  sentences  typed).  1 com  these  counts,  the  mean 
number  of  words  per  sentence  Inumber  of  spaces  di\ided  b\  numbei 
of  sentences  (w/s)|  and  the  mean  length  of  words  [number  of  strokes 
divided  by  number  of  spaces  (s/w)|  may  be  computed.  Based  on  ex- 
amination of  grade  school  texts,  the  regression  equation  predicting 
grade  level  from  the  above  ratios  is; 

AKl  ROL  - 0. 50(w/s)  + 4.71  I Ww)  - 21.43. 


I'Tcsch  Reading  Kase 

I’robabh  the  most  popular  readability  formula  developed 
to  date  is  that  developed  bv  Rudolf  iTesch.  Working  in  the  1040's, 
I'Tcsch  concluded  that  sentence  length  is  important  to  predicting 
comprehension  for  adult  readers.  He-  similarly  indicated  that  the 
readers'  interest  in  a topic  should  also  be  related  to  readability. 
Ills  ' reading  ease"  formula  predicts  arbitrarily  scaled  reading 
ease"  as  a function  of  word  length  of  sentences  and  number  of  s\l- 
lables  per  100  words,  ffis  "human  interest'  index,  also  scaled 
arbitrarily  is  based  on  rates  of  occurrence  of  personal  words 
and  of  sentences  addressed  to  the  reader  (ITesch.  1 "4 : 
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The  FLesch  formulas,  especially  the  "reading  ease"  formu- 
la, have  become  the  most  widely  applied  in  the  entire  history  of 
readability  research.  This  wide  application  is  due  in  part  to  the 
ease  of  computation  of  his  formulas  and  partly  to  the  wide  exposure 
given  to  his  formulas  through  a long  series  of  popularized  books. 

Flesch  based  his  "reading  ease"  formula  on  data  based  on 
a large  set  of  reading  passages  normed  m terms  of  comprehension 
test  score  and  corresponding  school  grade  level.  This  set  of  origi- 
nal passages,  developed  in  1926,  was  revised  in  1950  to  reflect 
modern  topics  and  changes  in  population  reading  levels.  The  "read- 
ing ease"  score  was  recomputed  bv  Powers,  Sumner,  and  Kearl 
(1958),  based  on  the  newly  available  data.  According  to  their  revi- 
sion, arbitrarily  scaled  "reading  ease"  is  equal  to  -2.  2029 + (.  0778) 
(mean  sentence  length)  + (.  0455 )(number  of  syllables  per  100  words). 
This  recent  form  of  the  "reading  ease'  formula  has  been  implemented 
in  the  CM  program  specification. 


Cloze 


In  the  experimental  work  reported  in  subsequent  sections  of 
the  present  report,  cloze  score  was  employed  as  the  readability  cri- 
terion. The  cloze  procedure,  introduced  by  Taylor  (1958).  was 
demonstrated  to  rank  standard  reading  passages  in  the  same  order 
as  did  other  readability  formulas.  In  the  cloze  procedure,  readers 
are  presented  with  samples  of  text,  from  which  some  words  are  de- 
leted and  replaced  by  blank  spaces.  1’he  readers  are  requested  to 
fill  in  the  blank  spaces  with  the  words  they  think  were  deleted.  To 
the  extent  that  the  author  uses  the  words  that  the  reader  expects  and 
understands,  the  reader  will  fill  in  the  correct  words.  The  technique 
assumes  that  readability  is  a direct  function  of  the  number  of  omitted 
words  which  the  reader  is  able  to  fill  in. 


Taylor  indicated  that  the  ordering  of  cloze  scores  is  main- 
tained regardless  of  the  system  employed  in  word  deletion,  lie  used 
four  different  deletion  systems  on  three  -tandard  passages,  each  ol 
a different  difficulty  level,  and  found  that  the  rankings  were  the  same 
despite  the  deletion  system  employed.  The  four  deletion  systems  in- 
volved deleting  every  fifth  word,  every  seventh  word,  every  tenth 
word,  and  10  percent  of  the  words  at  random.  Others  have  reported 
that  deleting  20  percent  of  a passage  will  yield  sensitive  measures. 


The  cloze  method  is  free  from  many  of  the  disadvantages 
of  the  traditional  readability  measures.  It  can  be  applied  more 
appropriately  to  highly  technical  and  unusual  materials.  Very 
technical  material  might  be  rated  as  difficult  by  the  Flesch  formu- 
la, but  not  by  the  cloze  technique,  if  the  subjects  reading  the  pas- 
sage were  trained  in  the  subject  matter  area.  Conversely  the  read- 
ability of  the  writings  of  authors  such  as  Gertrude  Stein,  who  write 
in  short  sentences  with  relatively  simple  vocabulary,  but  whose  style 
is  such  that  the  material  is  difficult  might  not  be  accurately  reflected 
by  the  Flesch  and  Dale-Chall  techniques.  The  cloze  test  might  be 
expected  to  reflect  accurately  the  difficulty  of  such  reading  passages. 

Taylor  reported  correlations  of  . 70  and  . 80  between  cloze 
scores  and  comprehension  scores  received  by  Air  Force  trainees 
on  Air  Force  technical  material. 

In  developing  their  readability  equation,  Caylor  et  al.  used 
cloze  score  as  the  criterion  of  readability.  Because  they  believed 
the  cloze  test  to  be  more  objective  than  multiple  choice  tests  or  the 
other  more  traditional  indices  of  comprehension.  They  also  pointed 
out  that  cloze  had  "consistently  yielded  very  high  correlations  with 
multiple  choice  tests  and  other  more  subjectively  constructed  meas- 
ures of  comprehension  and  difficulty"  | Caylor  et  al.  , 1SJ72,  p.  12  | . 
Additionally,  as  part  of  their  work,  they  found  a correlation  of  ap- 
proximately . 80  between  cloze  score  on  150-word  passages  chosen 
from  the  readings  required  in  a wide  range  of  Army  jobs  and  achiev- 
ing reading  grade  level,  as  measured  by  the  l nited  States  Armed 
Forces  Institute  Reading  Achievement  Test  HI,  Form  A,  Abbrevi- 
ated Edition. 

The  inherent  advantages  of  the  cloze  procedure  are:  (1)  scor- 
ing ease,  (2)  scoring  reliability,  (3)  ease  of  application  to  nonstand- 
ard material  and  (4)  accounting  for  the  reader's  interest  in  and  prior 
knowledge  of  the  content.  The  disadvantages  of  the  procedure  are: 

(1)  cloze  is  a measure  of  readability  not  a predictor  of  readability, 

(2)  a sizeable  sample  of  subjects  is  required,  and  (3)  it  may  not  re- 
flect all  tvpes  of  comprehension. 
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III.  FURTHER  VERIFICATION  AND  ELABORATION  OF 
COMPREHENSIBILITY  ASSESS  MEN  I'  VARIABLES 


The  Si  oriented  variables  included  in  the  computer  algorithm 
were  originally  developed  and  evaluated  for  influence  on  readability 
by  Siegel  and  Bergman  ( 1 ! 1 74 ).  The  psycholinguistic  variables  were 
similarly  developed  and  examined  by  Lambert  and  Siegel  ( 1 ’*74).  In 
the  cited  studies,  all  SI  based  variables  were  found  to  be  related  to 
comprehension.  The  relationships  between  the  psycholinguistic  vari- 
ables and  comprehension  were  less  clear  cut.  but  the  measurement 
concepts  remained  viable. 

However,  it  seemed  that  further  test  and  evaluation  of  both 
the  SI  and  the  psycholinguistically  oriented  variables  were  warranted. 
Such  work  would  allow  additional  evaluation  of  the  variables  included 
in  the  CM  program,  as  well  as  the  development  of  insights  relative 
to  interaction  effects  with  which  the  prior  work  was  not  concerned. 

In  the  study  of  psycholinguistic  variables  performed  by 
Lambert  and  Siegel  ( 1 D 74 ),  variables  were  presented  in  individual 
sentences  in  many  cases,  and  a variety  of  comprehension  meas- 
ures were  used. 

In  the  related  exploratory  SI  oriented  work  of  Siegel  and 
Bergman,  completion,  true-false,  and  short  answer  questions  were 
used  to  evaluate  comprehension.  The  variables  were  presented  in 
paragraphs  prepared  for  the  purpose,  but  Air  Force  technical  train- 
ing materials  were  not  involved,  and  the  range  of  the  variables  was 
not  the  same  as  that  found  in  Air  Force  technical  training  materials. 

In  the  first  three  of  the  four  studies  described  in  the  present 
chapter,  all  passages  were  developed  through  modification  of  cur- 
rent Air  Force  materials;  all  variables  approximated  the  range  of 
the  variables  found  in  Air  Force  technical  training  and  related  ma- 
terials; and  a single  criterion  measure  of  comprehension,  the  cloze 
score,  was  used. 
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In  the  first  experiment  described  here,  text  was  prepared 
in  which  each  psycholinguistic  and  SI  oriented  variable  was  manipu- 
lated individually  in  a controlled  manner.  Comprehension  scores 
were  analyzed  separately  to  obtain  added  information  on  the  effect 
of  modification  in  level  of  single  variables.  This  first  experiment, 
accordingly,  may  be  considered  to  represent  a cross  validation  of 
the  findings  of  Siegel  and  Bergman  (1974)  and  of  Lambert  and  Siegel 
(1974). 

Experiment  II  was  designed  to  assess  the  interactive  effects 
of  a set  of  SI  oriented  variables  as  they  are  systematically  varied 
in  level.  A parallel  investigation  of  interactive  effects  among 
selected  psycholinguistic  variables  was  completed  in  Experiment  III. 

In  Experiment  IV,  passages  were  presented  at  exception- 
ally high  and  exceptionally  low  levels  of  CM l . 1’ he  Experiment  I\ 

work  provided  some  additional  insight  into  the  possible  explanation 
for  certain  of  the  results  emerging  from  Experiments  II  and  III. 


Levels  of  Variables 


In  Experiments  I,  II,  and  III,  levels  of  manipulated  vari- 
ables were  defined  with  reference  to  a set  of  descriptive  norms 
developed  by  Williams,  Siegel,  Burkett,  k Groff  (in  press).  The 
norms  were  developed  on  200  samples  of  technical  material  ran- 
domly taken  from  Air  Force  study  guides,  technical  orders,  career 
development  course  texts,  and  various  manuals  and  regulations. 
Based  on  analysis  of  these  samples,  norms  were  developed  which 
describe  the  level  of  each  SI  and  psycholinguistically  oriented  vari- 
able found  at  each  decile  of  the  samples. 


"Low,  " "medium,  " and 


’high’ 


levels  of  variables  were  defined  in  Experiments  1, 


II,  and 


[II  with  reference  to  these  decile  values.  A low  level  for  any 
variable  falls  below  the  first  decile  value  for  that  variable;  a 
med ium 'level  is  defined  as  falling  between  the  fourth  and  sixth  de- 
ciles, and  a high  level  lor  any  variable  indicates  that  the  variablt 
fell  above  the  ninth  decile  value.  In  all  cases,  variables  were 
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defined  in  such  a manner  that  high  numerical  values  and  high  dec  ile 
values  were  expected  to  be  associated  with  higher  comprehensibility 
and  low  numeric  or  decile  values  were  expected  to  be  associated 
with  lower  comprehensibility. 


Comprehensibility 

In  all  experiments,  the  criterion  of  comprehensibility  em- 
ployed was  a cloze  score.  To  obtain  this  score,  text  passages  were 
modified  by  arbitrarily  deleting  words  at  the  rate  of  1 of  10  or  1 of 
15  and  replacing  the  deleted  words  with  blank  spaces  of  standard 
length.  Individuals  read  the  modified  text  and  attempt  to  replace 
the  deleted  words.  The  proportion  of  correct  replacements  is 
called  the  cloze  score-.  Since  its  development  by  Taylor  (1053), 
the  cloze  technique  has  become  a popular  criterion  measure  of 
comprehensibility.  The  technique  is  held  to  be  tolerant  both  to 
variation  in  the  system  of  word  deletion  and  to  scoring  strictness. 
Further,  the  technique  is  held  to  account  for  interest  and  prior 
knowledge  of  the  reader  ( Taylor,  1053).  Materials  based  on  the 
cloze  approach  are  easily  constructed  and  do  not  require  extensive 
pretesting  or  item  validation  (Klare,  Sinaiko.  x Stolurow,  1071). 


F x [)  e ri  in  o n t I 


Kxperimetit  I sought  to  investigate  the  effects  of  individually 
manipulating,  within  a textual  passage,  each  of  the  psycholinguistie 
anil  SI  oriented  variables  on  cloze  scores.  As  such,  it  served  to 
verify  and  . .n  un  the  work  of  Sieged  and  Bergman  (1  o 74 ) and  of 
l.ambert  and  Siegel  ( 1 t • ft).  I’he  Structure -of- Intellect  oriented 
variables  involved  were;  (Ml  , ('MU,  \l\it  , LSI.  \ MS  . \MI. 
and  D VII  . The  psycholinguistically  oriented  variables  included 
were;  Yl ).  Ml),  I'C,  ( lx.  LB.  KB,  and  DC. 

Textual  Sample  Selection 


Forty -two  passages  were  randomly  selected  from  a set  of 
current  Air  Force  career  development  course  (Cl)C)  texts.  Such 
texts  are  technical  in  nature  and  present  the  technical  information 
required  for  an  airman  to  advance  in  a career  field  in  the  Air 
Force.  From  two  to  five  random  selections  were  made  from  each 
volume  of  each  of  the  following  courses:  ( DC  43151C  (Aircraft 
Maintenance  Specialist,  .let  Ai  re  raft.  One  and  Two  Fngincs);  CDC 
43113  (Aircraft  Mechanic);  and  CDC  G 4 a f> 0 (Inventory  Management 
Spec  i a I i st).  The  samples  taken  were  approximately  300  words  in 
length.  A sample  was  terminated  at  the  first  sentence  end  appear- 
ing after  the  300th  sampled  word. 

Tile  selected  passages  were  then  randomly  assigned  to  the 
43  desired  experimental  conditions  (14  variables  x 3 levels  of  each 
43  conditions). 

Preparation  of  Stimulus  Materials 
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For  the  three  blocks  selected  to  represent  a given  variable, 
one  block  was  rewritten  at  the  low  level  for  the  variable;  one  block 
was  rewritten  at  the  medium  level  for  the  variable;  and  one  block 
was  rewritten  at  the  high  level  for  the  variable.  Thus,  each  of  the 
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11  variables  was  tested  in  three  dittereiit  pa  ssag<  ■ s eif  struet  u ra  I lv 
varied  ditticulty.  No  attempt  was  made  to  equate  passage  < I i il' 1 1 • 1 1 1 1 s 
because  the  intent  was  to  keep  the  passages  a naiural  as  possihli 
and  to  test  the  elteet  ot  niexlitving  a variable'  how!  on  roniprehensi- 
h i 1 i t y . 

I'he  value  ot  t In ■ ass i gne  -d  ■ i I > \\  as  eon t rol  led  w'  i t h i u 

eaeh  ot  threw  suhseet ions  ot'  100  words  into  which  each  Moo  word 
block  was  div  ided.  This  procedure  ensured  eonsistenev  ihreuigh- 
out  the  passage  and  assured  the  accuracy  of  the  value  of  the  assigne 
variable. 

lie-writing  was  perfornual  vvitit  special  emphasis  on  avoid- 
ing modifications  not  directly  related  to  the  relevant  variable.  Few 
if  any.  sentence  characteristic  changes  were  made  which  were  not 
basic  to  the  measure  of  interest. 

hollowing  the  preparation  of  the  textual  passages  and  veri- 
fication of  level  of  the  assigned  variable,  cloze  test  forms  were 
prepared  by  deleting  every  10th  word,  the  first  word  deleted  in  * u« 
passage  being  random  selection  from  the  first  10  words  of  the  re  - 
vised passage.  C loze  test  forms  were  typed  in  double  spaceel  for- 
mat. \o  modifications  were  made  to  subheads.  Figure's,  diagraii 
e>r  tables  referenced  in  the  passages  we're  ne>t  shown,  sinew  tin-  v r 
iables  under  investigation  are  emly  nmasured  in  e onneeted  prose  . 

l'e'st  booklets  were'  them  asse’mhli'd.  Fae-h  V>< >e »k.l et  e em- 
taini'd  2 1 passages- -edther  numbers  1-2  1 (heniklew  A)  or  22-12 
(booklet  II).  Presentation  order  in  each  booklet  w as  individually 
randomized.  The  variables  assigned  to  each  booklet  art  indicated 
in  Table  4. 
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Sub |ects 


Two  a roups  of  siii-io  ts  mo  a roup  representing  moderate 
(high  school  level)  reading  ability  and  lhe  other  group  representing 
high  (college)  reading  ability,  were  invoked.  I'he  moderate  read- 
ing ability  group  included  25  paid  volunteer  subjects  of  a public  vo- 
cational-technical high  school.  These  subnets  wen  enrolled  in  a 
variety  of  curricula,  such  as  auto  mechanics,  electrical  technology 
food  preparation,  trowel  trades,  horticulture,  and  appliance  repair. 
The  high  reading  ability  group  was  composed  of  27  paid  volunteer 
undergraduate1  college  students.  These  subjects  were  majoring  in 
a variety  of  fields,  but  all  were-  then  enrolled  in  an  introductory 
psychology  course.  I’he  assignment  of  subjects  to  passages  is  also 
indicated  in  Table  4. 


kroced  ure 


Data  acquisition  varied  between  two  and  one-half  and  four 
hours  per  subject  based  on  individual  wort.  rate.  The  moderate 
reading  level  subjects  completed  all  passages  in  a single  session. 
The  high  reading  level  subjects  completed  their  work  in  two  ses- 
sions separated  by  one  to  two  days.  Recause  of  the  randomization 
procedures  instituted  and  because  of  the  rest  periods  allowed  during 
the  work,  concern  over  possible  fatigue  effects  seems  unwarranted. 

All  subjects  were  initially  administered  the  \elson-Renny 
Reading  Test  (Revised),  Form  A (Brown,  1 !•(>()).  Instructions  were 
then  presented  on  the  procedures  for  completing  the  cloze  passages. 
The  subjects  were  instructed  to  work  at  their  own  pace  and  told 
that  both  speed  and  accuracy  were  important.  Breaks  in  the  work 
were  permitted.  Booklets  containing  low  numbered  or  high  num- 
bered passages  were  distributed  to  the  subjects  randomly.  This 
was  not  a ' speed  test1'  in  any  sense  of  the  term.  All  subjects  com- 
pleted all  work  and  no  time  limit  was  imposed. 


.■id 


I Had i n}4,  I .i-  v e 1 


As  indicated  in  I ■' i <jii i*f  2.  the  moderate  and  the  high  read- 
ing level  groups  were  well  separated  in  terms  of  \elson- Denny 
IHading  1 1 ? Si  > i •<  s/  l{(il ..  The  mean  Nelson-Denny  score  of  the 

moderate  group  was  111.  (j.  This  value  is  equivalent  to  an  IKil.  of 
10.00.  The  mean  Nelson-Denny  score  of  the  hit’ll  reading  ability 
group  was  Of).  7.  which  is  equivalent  to  an  IKil  above  1-1.0,  the 
highest  point  for  which  direct  equivalents  are  reported.  A Stu- 
dent's t test  performed  on  these  data  indicates  the  difference  to 
be  statistically  significant  below  the  .’01  level  of  confident:  < . 

flic  Nelson-Denny  Heading  l est  scores  of  members  of  the 
groups  receiving  the  low  and  the  high  numbered  passages  within 
the  moderate  IKil.  and  high  IKil.  groups,  and  the-  ov<  rail  subject 
group  were  compared.  \o  evidence  of  d l ffe fence s w as  found  in 
anv  comparison.  The  obtained  t values  were  respective  1\ : t 
. 343,  df  23;  t .077.  df  2.3;  and  t . 3 , df  30.  Aceord- 
i ngl \ the  hypothesis  of  no  difference  in  reading  ability  between 
the  subgroups  could  not  be  rejected,  and  the  data  for  the  two  sub- 
groups were  combined. 

( ’lo/c  Test  Scoring 

Tavlor  (lo33l,  in  initial  I \ presenting  the  cloze  procedure, 
reported  that  the  procedure  was  not  sensitive  to  the  scoring  criteria 
employed.  According  to  Taylor,  the  resultant  ordering  of  passage 
scores  was  not  changed  if  svnonyms  <>f  deleted  words  were  accepted 
as  correct  or  if  only  precist  e ate  lies  wort  accepted.  In  the  current 
work,  a rclativelv  strict  storing  procedure  was  followed; 
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FIGURE  2 DISTRIBUTION  OF  NELSON  DENNY  RAW  SCORES  AND  GRADE  LEVEL 
EQUIVALENTS  OF  MODERATE  AND  HIGH  READING  GRADE  LEVEL 
SUBJECT  GROUPS  IN  EXPERIMENT  I 
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Ill  Li  1 1 i uses,  cloze  test  sriU'cs  arc'  repi ) t * t » • c I as  percentaL'e  of  do- 
le li  d words  owto  I l\  entered,  aceordme  to  the  criterion  de.si  ribed 
aliuvr.  Ibis  is  the  normal'  clo/«  scoir. 


Analysis  of  Cloze  Test  Data 


Clo/.e  test  ilata  related  to  e;u  i i p - veliol  i-unnst  ie  or  SI  % an  - 
able  wore  analyzed  independently.  fourteen  separate  ;mal\ sc 
variance  were  accordinttls  performed.  I'aeh  tariano  analysis  in- 
vestigated the  effect  on  c loze  score  of  variation  of  one  of  tin  1 1 
comprehensibility  car  ables  over  three  levels  when  Kill.  was  varied 

o ve r two  levels. 

Decause  cell  f rt  *q acne i e s in  the  analyses  were  not  equal  (due 
to  failure  of  some  scheduled  subjects  to  appear),  an  uuw  e indited  - 
means  analysis  was  performed  (Winer,  1962).  The  results  of  the 
11  separate  analyses  of  variance  are  presented  in  the  paragraphs 
below . 

t 'pen  it  ion  of  Si  in  am  n lnits(t'\|l  ) 

bln*  results  of  the  variance  anahsis  relative  to  the  ('MI  ari- 
aide  are  summarized  in  I'alde  a. 


Table  a 


Sou  t’ce 

Siimman  of 

Anal>  sis  of  \ ai 

nance  for  ( 

AMI  Data 

K 

ss 

df 

MS 

(All 

. 11067 

• ) 

. 1 7:54 

I1'.  67 

Heading 

1 .t'vel 

. 1.762 

1 

. 1 762 

20.  0 1 

C’ Ml  \ I 

lead  i inf  1 .eve  1 

.0112 

. 00  76 

••  1 n. 

\\  ithin  ( 

ell 

. 7236 

()  < 

. 007  6 

i)  < . U 1 


Statist  icalls  significant  differences  were  associated  with  both  C'ML 
level  and  reading  group.  These  differences  were  statistically  sig- 
nificant below  the  . (J1  level  of  confidence.  The  mean  cloze  score 
at  each  level  of  c ' VI l for  t lie  moderate  and  the  high  reading  groups 
is  displaced  graphic-alls  in  Figure  3.  The  straight  line  of  best  fit 
for  these  data,  as  derived  from  the  least  square  procedure,  i-;  also 
shown  m Figure  3.  Cloze  score  is  seen  to  be  positivels  sloped  to 
a slight  extent  with  increased  CM!  . The  high  reading  ability  group 
c-onsistontlv  scored  higher  than  the  low  group. 

C ‘ognition  of  Semantic  delations  (C\IK> 

Variation  in  level  of  the  Hill  variable  also  produced  a sta- 
tistic-alls significant  effect  ( Table  6)  on  cloze  score.  In  this  anal- 
vsis,  reading  ability  was  again  statistic-alls  significant  at  t lie  . 01 
level  of  confidence.  The  mean  data  for  this  analysis  are  summa- 
rized in  Figure  4,  which  indicates  that  cloze  score  increased  sub- 
stantialls  as  Hill  level  was  increased. 


'Table  6 

Summars  of  Analssis  of  Variance  for  ('Mil  Data 


Sou rce 

SS 

df 

MS 

F 

('Mil 

. 0(03 

•) 

. 3334 

1 S'.  115 

Heading  Level 

. 1730 

1 

. 173!' 

10.  2!' 

( ' \ 1 1 V \ 1 leading  1 eve  1 

. 0234 

.0117 

-v  1 n. 

\\  iin(  'ell 

1.  2 375 

1 1 

. 016!' 

p .01 
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FIGURE  3 MEAN  CLOZE  SCORE  AT  THREE  LEVELSOF  CMU 
FOR  MODERATE  AND  HIGH  ABILITY  READING  GROUPS 
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FIGURE  4 MEAN  CLOZE  SCORE  AT  THREE  LEVE  LS  'I  - ,'£'  FOR 
MODE  RATF  AND  HIGH  ABILITY  RFADING  GROUPS 


Memory  for  Semantic  l nits  (MMl  ) 


As  shown  in  Table  7.  the  MMl  effects  were  statistically, 
significant  at  the  . 01  level  of  confidence,  as  was  the  reading  abil- 
ity effect  in  this  analysis.  In  Figure  5,  the  line  of  best  fit  slopes 
positively  with  increasing  level  of  MMl  .-  The  high  reading  group 
mean  scores  were,  again,  greater  than  those  of  the  moderate 
reading  group. 


Table  7 


Summary  of  Analy  sis  of  Variance  for  MMl  Data 


Sou  fee 

SS 

df 

MS 

F 

M Ml 

. 11240 

2 

. 4123 

3 7.  113 

Heading  Level 

. 1741 

1 

. 1741 

1 3.  07 

MMl  x Heading  Level 

. 033  7 

2 

.01011 

1.  54  n. 

\\  i thin  Cell 

. 7332 

67 

. 010!' 

p .01 


Evaluation  of  Symbolic  Impl  ications  (ESI) 

As  shown  in  Figure  0.  cloze  score  increased  as  a function 
of  ESI  level,  and  the  means  of  high  ability  readers  were  higher  than 
those  of  moderate  ability.  In  the  associated  variance  analysis,  sum- 
marized in  Table  0.  both  treatment  effects  were  statistically  signifi- 
cant below  the  . 01  level  of  confidence. 
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FIGURE  5.  MEAN  CLOZE  SCORE  AT  THREE  LEVELS  OF  MMU  FOR 
MODERATE  AND  HIGH  ABILITY  READING  GROUPS 
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FIGURE  6 MEAN  CLOZE  SCORE  AT  THREE  LEVELS  OF  ESI  FOR 
MODFRATE  AND  HIGH  ABILITY  READING  GROUPS 


Table  0 


Summar\  of  AnaKsis  of  \ ananee  lor  l-.. SI  l)ata 


Source 

SS 

df 

MS 

F 

KSl 

. 2471 

. 12  30 

10.  13 

Heading  Level 

. 244  0 

1 

. 244!' 

20.  07 

ESI  x Heading  Level 

. 0247 

•) 

. 0124 

1 . 02  n.  s. 

Y\  ithin  Cell 

. 02  07 

6 7 

. 0122 

p < . U1 


Conv urgent  D rod Lie i i on  of  SemaiHie  Systems  (\  MS) 

■[’lie  \ MS  effect  and  reading  level  were  statistically  significant  • 
at  the  . 01  level  of  confidence  in  this  tilth  analysis  (suntmat  ized  i i_ 
Table  9).  Figure  7 indicates  that  cloze  scores  were  lower  at  high 
levels  of  N MS,  and  that  high  ability  readers,  on  the  average,  out- 
scored  moderate  ability  readers.  The  data  trend  was  not  in  the  an- 
ticipated direction  and  is  discussed  in  the  subsequent  discussion  sec- 
tion relative  to  the  set  of  SI  variables.  We  note  here,  however,  that 
VMS  is  not  included  in  the  CM  program. 


Table  9 


Source 

Summarv 

of  Analvsis  of  Variance  for  \ MS  Data 

• 

K 

SS 

d f 

MS 

\ MS 

. 4353 

•; 

. 2426 

20.  00 

Headine 

1 .evcl 

. 2022 

i 

. 2 022 

2 ‘2 . 2 2 

N MS  \ 1! 

eading  1 evi 

4 .0213 

2 

.0100 

1.10  it.  i 

U ithin  < ' 

ell 

. 0005 
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FIGURE  7.  MEAN  CLOZE  SCORE  AT  THREE  LEVELS  OF  NMS  FOR 
MODERATE  AND  HIGH  ABILITY  READING  GROUPS 
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FIGURE  8.  MEAN  CLOZE  SCORF  AT  THREE  LEVELS  OF  NMI  FOR 
MODERATE  AND  HIGH  ABILITY  READING  GROUPS 


Convergent  Production  of  Semantic  Implications  ( N All) 


As  for  \ MS,  cloze  score  decreased  w ith  increases  in  \ .\ll. 
Again,  the  high  ability  readers  outscored  the  moderate  ability  read- 
ers (figure  These  were  again  statistically  significant.  The 

summary  of  the  variance  analysis  of. these  data  is  presented  in  Ta- 
ble 10.  The  negative  slope  is  discussed  in  the  discussion  section 
for  the  St--  . 1 'i  re— of— Intellect  variables. 


Table  10 

Summary  of  Analysis  of  \ ariance  for  \ Ml  Data 


Sou  rce 

SS 

df 

MS 

F 

NMI 

1 . f)  1 1 0 

2 

. 75515 

5 5 . 17 

Reading  Level 

. 3127 

1 

. 5127 

22.  132 

\ \||  x Reading  1 .evel 

. 0675 

2 

. Olfjh 

2.  47 

\\  i til  i n ( 'el  1 

1.  0554 

( 7 

. 0157 

p .01 

Divergent  Production  of  Semantic  1 nits  (DM!  ) 

Variation  in  DM l level,  as  shown  in  the  variance  analysis- 
summary  of  'Table  11.  produced  effects  which  were  statistically 
significant  below  the  . 01  level  of  confidence.  The  reading  level  ef- 
fect m this  analysis  similarly  was  statistically  significant  below  the 
.01  level,  figure  0 shows  that  mean  cloze  score  was  positively 
re lat ed  to  I ) \ 1 1 value. 


Table  11 

Sum  mars  of  Analysis  of  \ ariance  lor  l)\ll  Data 
Source  SS df vi> f 


<3\0 


A HIGH  READING  ABILITY 


O 


J 1— L_ 

3W  MODERATE  HIGH 


FIGURE  9.  MEAN  CLOZE  SCORE  AT  THREE  LEVELS  OF  DMU  FOR 
MODERATE  AND  HIGH  ABILITY  READING  GROUPS 
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FIGURE  10  MEAN  CLOZE  SCORE  AT  THREE  LEVf  l vy  r.  ; 
MODERATE  AND  HIGH  ABIl  IT  Y RF  ADiNG  GR  • 


Yngve  Depth  (YD) 


Variation  in  level  of  Yl),  a psycholinguist icalls  oriented 
variable,  produced  effects  which  were  statistically  significant  la  - 
low  the  . 01  level  of  confidence.  Higher  mean  levels  of  the  Yl) 
measure  were  associated  with  elevated  cloze  scores  (Figure  10) 
and  higher  reading  ability  persons  tended  to  achieve  higher  mean 
cloze  scores.  This  difference  was  also  statistically  significant  at 
the  . 01  level  of  confidence. 


Table  12 


Summary  of  Analysi 

. s of  Variance 

for  Y 1 ) Data 

F 

Source 

SS 

df 

MS 

YD 

17711 

2 

. 0880 

10.  96 

Reading  Level 

. 1570 

1 

. 1570 

19.  38 

YD  x Reading  Level 

. 02  72 

2 

. 0136 

1.  68  n. 

Within  Cell 

. 6243 

77 

. 00L11 

p < . 01 


Morpheme  Depth  (MO) 

According  to  the  analysis  summarized  in  Table  13,  the  i\IO 
level  variation  produced  an  effect  on  cloze  score  which  was  statis- 
tically significant  at  the  . 01  level  of  confidence.  The  interaction 
with  reading  ability  was  statistically  significant  at  the  same  level. 
The  high  reading  ability  subjects  consistently  produced  higher  cloze 
scores  than  the  moderate  ability  readers,  figure  11  presents  the 
obtained  trend. 


Table  13 


Summary 

of  Analy  sis  of  Variance 

for  Ml)  Data 

1' 

Source 

SS  df 

MS 

MU 

.4126  2 

. 2063 

1 ■.  2 0 

Heading  Level 

.2030  1 

. 200" 

IT  42 

\il>  x Heading  Level 

. 0002 

. U44 1 

4.  12  n 

. . s , 

\\  ithin  Cell 

. 02 < 3 ii 

. 0107 

1)  .01 

Transformational  Complexity  (TO 

While  variation  in  level  ol'  IV  affei  ten  >zt  score  at  or  : »•- 
low  the  .01  level  of  confidence,  the  n ean  data  >ints  M i.,  .n  13' 
for  each  reading  abilitv  group  essentially  Mr  • i a sym  «-t  rival 
" \ The  straight  line  of  best  fil  was  < — ent  tally  horizontal.  Again, 
the  reading  grade  level  effect  was  statistically  significant. 


Table  14 

Smnma ry  of  Analy  sis  of  \ anance  inr  [c  Main 


Source  SS  dl ■ : ^ 


TC 

. 3 7 i 6 

•) 

. loot; 

14.  .42 

Heading  Level 

. 656.4 

1 

. 6.4  6.4 

.40.  .40 

TC  x Heading  Level 

. 0234 

.0117 

1 ii 

W ithin  Cell 

. l|p74 

1 < 

. 0130 

p . .01 

Center  Embeddedness  (CL) 

The  analysis  of  variance  summarized  in  fable  1.4  indicates 
that  level  of  CE  significant  ly  affected  clo/e  score  at  >r  below  the  . 01 
level  of  confidenci1.  as  did  reading  ability.  Nie  trends  tor  the  line 
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FIGURE  11,  MEAN  CLOZE  SCORE  ATTHREE  LEVELS  OF  MD  FOR 
MODERATE  AND  HIGH  ABILITY  READING  GROUPS 
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FIGURE  12  MEAN  CLOZE  SCORE  AT  THREE  LEVELS  OF  TC  FOR 
MODERATE  AND  HIGH  ABILITY  READING  GROUPS 
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of  best  fit  for  the  C'E  variable  is  in  the  tionpred ieted  dir<  ctiou.  I ins 
is  discussed  later.  Figure  Id  indicates  that  mean  clo/.e  score  de- 
creased  with  increasing  level  of  CE.  The  subjects  a|>|>ear  to  iiave 
benefitted  from  low  levels  of  C'E,  and  high  ability  readers  appear  to 
have  benefitted  to  a greater  degree  at  this  level  than  low  al > i 1 i t \ 
readers. 

Table  1.) 


Sum mary  of  Analysis  of  \ anance  for  t I ! Data 


Sou  rce 

SS 

d f 

MS 

F 

C'E 

. 1 1 1 7 6 

') 

. 4<’)M! 

35.  2" 

Reading  Level 

. 1142 

1 

. 1142 

o.  <2 

Cl  x Reading  I.evel 

. 0 ! * 7 .3 

. 0426 

3.  74  n. 

Within  Cell 

i . oo:i;; 

< < 

. 0120 

p . . 01 

Left  Branching  (I . I > > 

The  effects  of  left  branching  on  cloze  score  were  statistically 
significant  at  the  . 01  level  of  confidence.  The  effect  of  reading  lev- 
el in  this  analysis  was  statistically  significant  at  the  . 01  level  of 
confidence  ( Table  10).  Figure  14  shows  that  mean  cloze  score  in- 
creased in  a very  orderly  manner  as  LB  increased  and  that  the  mean 
difference  between  the  moderate  and  the  high  ability  readers  was  con- 
sistent. 

Table  16 


Summary  of  Analy  sis  of  \ anance  for  1.B  Data 


Sou  rce 

SS 

df 

MS 

F 

LB 

. 7 036 

•) 

. 351o 

2 5.  50 

Reading  Level 

. 2043 
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. 2043 

14.  80 

LB  x Reading  i ,evel 

.0144 

') 

. 00  72 

1 n, 

Within  Cell 

. 0 1 02 

66 

. 013;; 

p .01 
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FIGURE  13  MEAN  CLOZE  SCORE  AT  THREE  LEVELSOF  CE  FOR 
MODERATE  AND  HIGH  ABILITY  READING  GROUPS 
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Kight  Branching  (HB) 


The  effects  of  right  branching  were  statistically  significant 
at  the  . 01  level.  The  plot,  shown  in  Figure  15,  reflects  a slightly 
negative  slope  of  mean  cloze  score  as  a function  of  HB. 

The  influence  of  reading  ability  on  cloze  score  was  not  sig- 
nificant in  this  analysis,  although  the  direction  of  the  trend  was  in 
the  anticipated  direction. 


Table  IT 


Summary 

of  Analysis  of  Variance  for  KB  Data 

F 

Source 

ss 

df 

MS 

KB 

. 1 146 

2 

. 0573 

5.  67 

Heading  "el 

. 0006 

1 

. 0006 

^ 1 n.  s, 

KB  x Heading  Level 

.0101 

2 

. 0050 

1 n.  s 

\\  ithin  Cell 

. 6736 

67 

.0101 

p C .01 


Deleted  Complement  (DC) 

Complement  deletion  affected  cloze  score  at  the  . 01  level  of 
confidence  and  the  slope  of  the  mean  cloze  score  data  as  a function 
of  the  level  of  this  variable  (Figure  16)  was  slightly  negative.  Again, 
the  high  reading  ability  subjects  on  the  average  outscored  the  mod- 
erate reading  ability  subjects,  attd  the  difference  was  statistic  all\ 
significant  at  the  .01  level  of  confidence  (Table  16). 


CLOZE  SCORE  CLOZE  SCORE 


100  f- 


LOW  MODERATE  HIGH 

FIGURE  15.  MEAN  CLOZE  SCOPE  AT  THREE  LEVELS  OF  RB  FOR 
MODERATE  AND  HIGH  ABILITY  READING  GROUPS 


LOW  MODERATE  HIGH 

FIGURE  16  MEAN  CLOZE  SCORE  AT  THREE  LEVELS  OF  DC  FOR 


MODERATE  AND  HIGH  ABILITY  READING  GROUPS 
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Table  111 


luminary  of  Analysis  of  Variance  for  IK  Data 


Source 

SS 

df 

MS 

F 

DC 

. 3o 04 

. 1232 

10.  F 

Heading  Level 

. 2 0 0 4 

1 

. 2304 

20.  3 6 

DC  x Heading  1 .evel 

. 03  72 

2 

. 0136 

1.  1 1 n.  s 

Within  Cell 

. ‘47  7 

t i 

.012  3 

p < .01 


Trend  Analyses 

The  several  analyses  of  variance  were  extended  in  order  to 
determine  the  absence  or  presence  of  significant  quadratic  trends 
across  the  examined  levels  of  the  studied  variabh  s.  Sine  ■ the  inter- 
actions between  variable  and  reading  level  were  not  stati st n ails  sig- 
nificant in  any  of  the  cases,  the  analyses  were  continued  as  single 
factor  analyses.  Techniques  appropriate  to  unweighted  n cans  anal- 
yses were  continued. 

In  the  cases  of  three  variables--YD,  C\1H.  and  l.lb-li  uar 
trends  reached  significance  (p  . 01).  while  quadratic  trends  did  not. 
In  seven  analyses-- MMl  , ESI,  \ MS,  V\ll,  i)\ll  . Ml),  and^c  F - -Loth 
the  linear  and  the  quadratic  trends  attained  statistical  significatu  < 
below  the  .01  level.  In  the  remaining  four  cases--C'Mt  T(  . 111! 
and  DC--onl\  the  quadratic  trends  achieved  statistical  significance 
(p  q .01).  These  findings  indicate  variables  YD  C\lli.  and  1 M to 
be  linear  in  their  relationship  to  the  criterion.  The  comprehensi- 
bility variables  in  the  group  of  seven  displaced  a linear  trend  which 
was  augmented  bv  a quadratic  trend,  and  the  variables  in  the  final 
group  of  four  exhibited  a purely  quadratic  trend.  However,  as 
pointed  out  by  llieks  (1064),  a quadratic  trend  must  be  viewed  with 
caution  in  situations  such  as  that  involved  here  beeausi  a quadratic 
equation  may  lie  passed  through  an\  three  points.  In  order  to  write 
the  quadratic  trends,  the  variables  should  be  examined  at  additional 
levels.  It  is  for  this  reason  that  only  straight  lines  of  best  fit  have 
been  presented  in  Figures  3 through  16. 


1)  i s c u s s i o n 


Strueturr-of-Intelleet  \ ariable  Results 


In  a concurrent  effort,  the  same  SI  and  psycholinguistic  vari- 
ables as  considered  here  were  employed  as  a basis  for  developing  a 
multiple  linear  regression  equation  relating  cloze  score  to  variable  score 
(Williams,  Siegel,  Burkett,  <X  Groff,  in  press).  Comparison  of  the 
present  results  with  the  results  indicated  by  this  regression  work  pro- 
vided one  basis  for  evaluating  the  present  indications.  A second  eval- 
uative basis  rested  on  an  integration  of  the  present  results  with  those 
of  tiie  Siegel  and  Bergman  (1074)  study  in  which  the  same  SI  variables 
were  involved.  The  results  of  the  investigation  of  the  SI  variables  in 
the  current  work  and  in  the  other  mentioned  efforts  are  summarized  in 
Table  1 !J.  As  indicated  by  Table  1!',  significant  effects  and  positive 
correlations  between  level  of  intellectual  load  (assumed  to  be  imposed 
by  levels  of  each  variable)  and  comprehension  consistently  support 
the  SI  variables  as  measures  of  textual  comprehensibility,  with  the 
exception  of  i\  A IS  (convergent  production  of  semantic  systems)  and 
N MI  (convergent  production  of  semantic  implications). 


Table  19 

Summary  of  Structure-of-lntellect  Results 
in  Current  Work  and  Other  efforts 


. ■ ; J ■.  Bergman 

v.  ill  j n 

e t a 1 . 

Current  Work 

Proba-  Point 

Proba- 

Product 

Proba- 

Var iabl e 

b i 1 i t v 

Pi  serial 

r b i 1 i t y 

Komcnt  r 

bi  1 i ty 

R 

CMC 

< 

. 02  5 

. 2 8 

< . 

01 

. 3 9 

< 

. 01 

, :p 

CM!! 

< 

. 002 

. 56 

< . 

01 

. 19 

<* 

. 01 

MMU 

< 

. 005 

. 35 

< . 

01 

. 38 

< 

.01 

. ' " 

ESI 

< 

. 001 

. 31 

< . 

01 

. 33 

< 

. 01 

. . • 

N MS 

. 005 

. 33 

< . 

01 

. 18 

< 

. 01 

N Ml 

< 

. 001 

. 3!) 

01 

-.  1 1 

< 

. 01 

DM1 

< 

. 001 

. 38 

n. 

s. 

-.  01 

< 

. 01 
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There  are  a number  of  possible  reasons  for  the  differences 
between  the  current  results  and  those  of  the  related  prior  work 
with  regard  to  N MI.  The  NMI  variable  evolved  considerably  since 
the  exploratory  work  of  Siegel  and  Bergman  in  which  it  was  defined 
with  regard  to  degree  of  completeness  of  syllogisms  contained  with- 
in sample  passages.  In  the  current  study,  the  variable  was  defined 
with  regard  to  the  mean  number  of  parts  of  speech  of  words  con- 
tained within  a passage.  Although  both  approaches  were  believed 
to  measure  the  same  construct,  the  possibility  of  sensitivity  differ- 
ences exists. 

Aside  from  definitional  differences,  there  were  a number  of 
other  differences  between  the  two  studies.  Siegel  and  Bergman  em- 
ployed a much  wider  range  of  N MI  than  was  employed  in  the  present 
work.  Additionally,  they  used  Air  Force  enlisted  personnel  as  sub- 
jects, as  compared  with  the  technical  vocational  high  school  and  col- 
lege subjects  of  the  present  study.  It  is  believed  that  some  similar- 
ity in  reading  ability,  interests,  and  mental  ability  exists  among  the 
various  groups,  but  the  possibility  of  subject  differences  remains 
open.  Additionally,  the  Siegel  and  Bergman  textual  materials  were 
not  of  the  technical  training  nature,  as  were  the  textual  materials 
of  the  present  work. 

The  correlations  resulting  from  the  regression  equation 
work  and  which  are  included  in  Fable  19  are  correlations  between 
variable  scores  and  cloze  scores  for  the  complete  subject  group 
employed  in  the  multiple  regression  work.  Separate  correlational 
values,  based  on  data  provided  by  low  ability  readers  and  by  high 
ability  readers  (as  defined  by  Nelson-Denny  Heading  Test  Scores) 
were  also  available  as  the  result  of  the  regression  work.  The 
product  moment  correlation  between  NMI  variable  score  and  cloze 
score  for  low  ability  readers  was  -.007.  The  correlation,  based 
on  high  ability  reader  data  was  -.214,  or  essentially  zero.  Ac- 
cordingly, the  negative  slope  of  comprehension  as  a function  of 
NMI  is  consistent  with  other  results  relating  level  of  NMI,  as  cur- 
rently defined,  and  cloze  score. 


VMS,  convergent  production  of  semantic  systems,  has  been 
consistently  defined  throughout.  Accordingly,  differences  between 
tlie  present  and  the  prior  work  cannot  be  attributed  to  definitional 
problems.  However,  the  other  differences  between  the  Siegel  and 
Bergman  work  and  the  present  work  hold  and  may  be  causative. 

In  tins  regard,  we  note  that  an  extreme  restriction  of  range  was 
imposed  on  the  present  N MS  variable  range  resulting  from  the 
norms  used  in  identify  decile  levels.  The  low,  medium,  and  high 
text  passages  had  N MS  values  respectively  of:  U.  1)00,  0.  062,  and 
0.  375,  equivalent  to  means  of  0.  00,  0.20,  and  1.  50  comprehension 
aids  per  hundred  words  of  text. 

T’inally,  the  reading  group  by  SI  interactions  were  remark- 
ably free  from  statistical  significance.  This  result  serves  to  ex- 
tend the  potential  of  the  findings  because  it  suggests  that  variable 
effectiveness  was  not  differential  across  reading  groups. 

Psycholinguistie  Variable  Results 


The  present  data,  relevant  to  psycholingui stically  oriented 
variables,  are  discussed  here  in  relationship  to  the  regression  work, 
the  results  of  the  exploratory  work  of  I ambert  and  Siegel  (1074), 
and  the  results  obtained  by  other  workers  studying  the  same  lan- 
guage variables.  The  findings  in  the  current  analyses  and  correla- 
tions between  variable  level  and  cloze  score  obtained  during  the  re- 
gression work  are  summarized  in  Table  20. 

Lambert  and  Siegel  (1074)  reported  Yngve  depth  to  affect  in- 
consistently comprehensibility  in  their  own  work  and  in  that  of  oth- 
ers. The  effects  of  this  factor  appeared  to  be  partly  a function  of 
the  method  by  which  YD  was  examined.  Lambert  and  Siegel  con- 
cluded that  while  the  variable  may  be  method  sensitive,  it  may  be 
useful  in  determining  the  mental  load  imposed  by  a written  passage. 
In  the  current  analysis,  a slight  positive  slope  was  found  in  the  func- 
tional relationship  between  cloze  score  and  YD.  Similarly,  a posi- 
tive correlation  coefficient  was  obtained  for  the  corresponding  cor- 
relation in  the  development  of  the  readability  regression  equation. 
Accordingly,  there  is  continued  support  for  considering  YD  as  a fac  - 
tor affecting  comprehensibility. 


in  Current  Work  and  Related  Efforts 


Variable 

Current 

Work 

w i l i i m 

• al. 

Probabi 1 i tv 

R 

Probe! . i 1 1 1 v 

Product 

Korn . •; . * 

YD 

< 

. 01 

. 01 

. 13 

MD 

< 

. 01 

. 66 

< • 01 

. 33 

TC 

< 

. 01 

.46 

< . 01 

. 2 5 

CE 

< 

. 01 

.85 

n.  s. 

-.  02 

LB 

< 

. 01 

.78 

n.  s. 

-.  04 

RB 

< 

. 01 

.89 

< . 01 

-.  16 

DC 

< 

. 01 

.56 

n.  s. 

-.  02 

Lambert  and  Siegel  found  evidence  that  MD  influenced  com- 
prehension when  text  was  examined  at  the  sentence  level  and  at  the 
paragraph  level.  Surveyed  literature  also  consistently  reported 
the  same  relationship.  In  the  development  of  the  comprehensibil- 
ity regression  equation,  a positive  correlation  (.  33)  was  found  be- 
tween AID  level  and  cloze  score.  In  the  current  work,  a significant 
effect  due  to  MD  was  found,  and  the  slope  of  the  trend  line  was  posi- 
tive. Examination  of  Figure  11  shows  that  the  mean  moderate  read- 
ing ability  subject  cloze  score  was  not  highest  at  the  high  AID  level. 
Nonetheless,  the  variable  remains  viable  as  a comprehensibility 
variable. 

Transformational  complexity  has  been  investigated  as  a de- 
terminant of  readability  many  times.  Studies  reported  by  Lambert 
and  Siegel  (Coleman,  1964,  1965;  Slobin,  1966;  Wason,  1959;  1961; 
etc.  ),  as  well  as  more  recent  work  (Evans,  1972-73;  Pcltz,  1973-74) 
support  this  concept  as  a comprehensibility  variable.  Lambert  and 
Siegel's  work  was  consistent  with  this  trend,  as  were  findings  of  the 
current  work  and  those  of  the  readability  equation  development. 
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Lambert  and  Siegel  provided  moderate  agreement  with  the 
findings  of  Schwartz  et  al.  (1970)  and  of  Wang  (1970).  that  center 
embeddedness  interfered  with  comprehension.  In  the  current 
work,  a signiticanf  effect  was  associated  with  this  variable.  How  - 
ever, the  direction  of  the  trends  in  both  the  current  work  and  in  the 
regression  work  were  opposite  to  those  hypothesized.  In  these  cases, 
the  dependent  measure  was  cloze  score,  unlike  those  of  the  other 
cited  studies.  The  apparently  consistent  negative  trend  ma\  be  evi- 
dence of  method  bias  with  regard  to  this  variable. 

Left  branching  and  right  branching  seem  to  affect  compre- 
hension inconsistently,  judged  by  findings  of  Schwartz  et  al.  (1970). 
Hamilton  and  Deese  (1971),  and  Lambert  and  Siegel  (1974).  The 
current  results  supported  left  branching  and  right  branching  as 
variables  for  predicting  the  comprehensibility  of  written  materi- 
als. The  regression  results  were  inconsistent  with  the  present  re- 
sults. suggesting  these  to  be  weak  variables. 

Complement  deletion  was  hypothesized  by  Lambert  and 
Siegel  to  degrade  comprehension,  following  l-'odor  and  Garrett 
(1967)  and  Hakes  (19 72).  Lambert  and  Siegel's  data  did  not  sup- 
port their  hypothesis  and  indicated,  in  fact,  a trend  in  the  opposite 
direction.  1'he  current  results  show  that  deletion  of  complements 
aided  comprehension  to  a slight  degree,  and  the  regression  equa- 
tion data  do  not  negate  these  results.  We  note  that  the  rarity  of 
occurrence  of  complements  or  their  deletions  should  preclude  the 
finding,  in  the  practical  situation,  of  a powerful  effect  due  to  this 
variable.  Current  theory  describes  complements  as  markers  of 
sentence  structure.  Ilepeated  findings  that  such  markers  inter- 
fere with  comprehension  would  argue  that  current  conceptions  re- 
garding complements  are  incomplete. 

Finally,  we  note  that  a negative  slope  or  a horizontal  trend 
does  not  negate  the  value  of  a comprehensibility  variable.  Clearly, 
such  trends  might  indicate  only  that  the  effect  of  the  variable 
diminishes  along  the  scale  or  that  the  effect  of  the  modification 
was  relatively  constant. 


Experiment  II 


The  data  for  Experiment  II  were  actually  acquired  prior  to 
Experiment  l.  The  second  experiment  reported  here  sought  evi- 
dence of  interactive  effects  among  SI  variables  relative  to  compre- 
hensibility. To  this  end,  each  of  three  SI  oriented  measures  was 
varied  within  a factorial  design.  The  SI  variables  were  selected 
on  several  bases;  freedom  from  apparent  redundancy  with  other 
measures,  minimal  overlap  of  underlying  Guilford  categories, 
range  of  numerical  values  of  the  variable  as  measured  during 
norm  development,  and  inclusion  in  the  Comprehensibility  Meas- 
urement (CM)  computer  program  in  its  form  at  the  time  of  the 
study.  The  chosen  variables  were  CMU,  ESI,  and  NMI,  as  indi- 
cated in  Table  2 1. 


Table  2 1 


Comparison  of  Structure-of-Intellect  Variables 

Variable 

Normat ive 
Range 

Comments 

Chosen 

Variables 

CMU 

.293  - .476 

Two  intellective  categories  common  with 
CMR;  vocabulary  oriented 

X 

CMR 

.008  - .070 

Two  intellective  categories  common  with 
CMU;  vocabulary  oriented 

MMU 

.783  - .874 

One  intellective  category  common  with 
CMU  and  CMR;  vocabulary  oriented 

ESI 

.943  - 1.000 

Not  vocabulary  oriented  as  are  CMU,  CMR, 
and  MMU 

X 

NMS  0 

.000  - .375 

Not  in  computer  program 

NMI 

.388  - .605 

Reasoning 

X 

Reasoning 
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IjMU  0.00C  - .175 


Various  details  of  Experiment  II  were  treated  as  they  were 
in  Experiment  1.  Levels  of  manipulated  variables  were  defined 
with  respeel  to  the  set  of  norms  developed  by  Williams  et  al.  (in 
press)  exactly  as  they  were  in  Experiment  l.  The  dependent  vari- 
able employed  in  Experiment  JI  was,  again,  cloze  score. 

The  experimental  paradigm  is  presented  in  Figure  17. 
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H=  high 
M=  moderate 
L=  low 


Figure  17.  Paradigm  for  Experiment  II. 
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Textual  Sample  Selection 

Twenty  -seven  passages  were  randoml>  selected  from  FDF 
texts  supplied  for  the  purpose  by  the  Air  Force.  From  one  to  three 
random  selections  were  made  from  each  text  volume  of  each  of 
three  courses;  CDF  4'1 15  IF  (Aircraft  Maion  iam  < Spi-malisi.  i ■ - 1 
Aircraft,  One  and  l'wo  Engine);  FDF  4.1113  (A  i iv  raft  \ I « > lur  i i c >; 
and  AFSF  64551'  (Inventory  Management  Specialist).  The  summit  - 
taken  were  generally  135  to  145  words  in  length  and  a sample  was 
terminated  after  the  first  sentence  end  appearing  after  tin  1-Oth 
sampled  word.  In  a few  cases,  this  rule  produced  slightly  l net  r 
samples.  In  these  instances,  if  the  final  sentence  of  the  sample 
was  a compound  sentence,  the  sample  was  terminated  at  a unction 
of  adjacent  independent  clauses  occurring  after  135  or  more  words. 

Preparation  of  Stimulus  Materials 

All  combinations  of  low,  medium,  and  high  level  of  the 
three  selected  SI  oriented  measures  were  required  for  the  experi- 
ment as  designed.  One  sample  CDF  passage  was  randomly  as- 
signed to  each  of  the  27  treatment  conditions.  The  level  of  the  ( Ml 
ESI,  and  \ \ 1 1 variables  was  then  measured  in  each  sample.  Each 
sample  passage  was  then  revised  if  necessary  so  that  the  level  >f 
the  three  measured  variables  fell  within  the  range  assigned  for  that 
respec  tive  sample. 

It  was  possible  in  this  sort  of  rewriting  to  achieve  desired 
levels  through  manipulations  such  as  making  separate  sentences 
of  complex  sentences,  blindl\  replacing  words  with  synonyms  to 
influence  vocabulary  diversity,  and  the  like.  Manipulations  such 
as  these  were  not  allowed  in  the  revisions.  Modifications  were  re- 
quired to  affect  the  variable  of  interest  in  the  equations,  and  modi- 
fied materials  were  required  to  appear  similar  in  tone  and  style  to 
the  original  passages. 


The  KCil.  of  the  originally  selected  FDF  samples  was  meas- 
ured through  application  of  the  Automated  Keadabilitv  Index  of 
Smith  and  Senter  (1‘66).  KG  1 s ranged  from  7.  76  to  16.  >1.  with 
the  median  fiGL  of  the  sample  at  11.50.  In  ord<  r to  av  >id  confound- 
ing obtained  cloze  scores  by  variation  in  textual  sample  reading  dit- 
fieultv  level,  all  revised  samples  were  modified  as  neci  ssary  to 
cause  all  measured  IlFI.'sto  fall  between  11.2  > and  11.  7:>.  I he 
equation  of  Smith  and  Senter  employs  word  length  in  letters  and 
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sentence  length  in  words  for  prediction  of  RGL.  These  factors  were 
not  involved  in  the  measures  under  study.  Hence,  revision  of  sample 
passages  for  the  purpose  of  modifying  RGL  should  minimally  affect 
the  independent  measures  studied. 


The  passages  were  set  in  cloze  test  form  by  deleting  every 
tenth  word  following  a word  deleted  at  random  from  the  first  ten 
words.  A packet  of  the  2 7 cloze  test  forms  with  presentation  order 
individually  randomized  was  prepared  for  each  experimental  subject. 


Test  Pacing 


A group  of  five  high  school  students  completed  a portion  of 
the  cloze  tests  to  permit  estimation  of  proper  pacing  of  the  test 
passages.  Time  required  for  individuals  to  complete  various  pas- 
sages ranged  from  8 minutes  to  14  minutes.  In  order  to  impose  a 
moderate  time  stress  on  the  subsequent  sub  jects,  a time  allotment 
of  8 minutes  was  selected  for  work  on  a single  passage.  Accord- 
ingly, the  results  obtained  may  be  limited  to  the  situation  in  which 
a slight  time  stress  is  involved.  We  note,  however,  that  the  situ- 
ation was  not  rushed.  All  subjects  finished  all  the  work. 


■ 


Subjects 


Sixteen  paid  volunteer  subjects  participated.  Eight  were 
solicited  from  a local  college  and  eight  were  respondents  to  a clas- 
sified advertisement  seeking  individuals  who  had  terminated  their 
formal  education  during  or  prior  to  the  tenth  grade. 


Procedure 


Data  collection  was  performed  during  a seven  hour  period 
of  a single  day.  Eirst,  the  \tTson-Denny  Reading  Test  (Revised) 
Form  A (Brown,  ln60)  was  administered  to  all  subjects.  They 
were  then  instructed  in  the  procedures  to  be  followed  for  complet- 
ing the  clo/.t  forms.  Eight  minutes  were  allowed  for  each  sample, 
and  the  subjects  were  warned  when  one  minute  of  working  time  re- 
mained. A ten  minute  break  was  taken  after  every  six  passages. 
Fifteen  passages  in  addition  to  the  \elson-l Jenny  Reading  l est  were 
completed  in  the  morning.  The  remaining  12  were  completed  after 
a Go  minute  lunch  break.  I’he  order  of  passage  presentation  was 
randomized.  Aceordinglv.  fatigue  effects  are  believed  to  be  equally 
distributed  over  the  data. 


As  in  Experiment  I,  the  mean  reading  ability  Levels  of  the 
group  of  subjects  were  well  separated.  As  indicated  in  Figure-  115, 
there  was  no  overlap  in  the  distributions  of  \elson-Denny  test 
scores  of  the  two  groups.  The  mean  raw  score  of  the  low  read- 
ing ability  group  was  43.  8 (equivalent  to  the  4.  5 grade  level)  and 
the  mean  raw  score  of  the  high  reading  ability  group  was  ! *5.  6 (cor- 
responding to  a reading  grade  level  above  the  14th  grade).  A t test 
indicated  this  difference  to  be  statistically  significant  below  the  . 001 
level  of  confidence  (t  - 5.  82,  df  14).  Accordingly,  the  two  groups 
can  be  considered  to  represent  separate  populations. 

Cloze  Test  Scoring 

The  cloze  form  scoring  for  Experiment  II  was  identical  in 
all  respects  to  the  scoring  procedures  followed  in  Experiment  1. 

■ 

)ata  Treatment  | 


The  data  of  Experiment  II  were  analyzed  through  a variance 
analysis.  The  results  are  summarized  in  Table  22. 

Main  Effects 

Of  the  SI  oriented  variables,  only  the  effects  of  ESI  were 
statistically  significant.  Plots  of  the  main  effects  are  shown  in 
Figures  19,  2 0,  and  21.  The  plots  indicated  that  CM  l was  associ- 
ated with  cloze  score  in  the  expected  way.  ESI  variation  affected 
cloze  score  of  low  ability  readers  in  the  expected  way.  but  its  ef- 
fects on  cloze  scores  for  high  ability  readers  as  well  as  the  effects 
of  NMI  on  cloze  for  both  reading  ability  groups  were  irregular. 

The  large,  consistent  difference  between  the  cloze  scores 
obtained  by  high  and  by  low  ability  readers  was  statistical  1,\  signifi- 
cant at  or  below  the  . 01  level  of  confidence. 
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FIGURE  18  DISTRIBUTIONS  OF  NELSON  DENNY  RAW  SCORES  AND  GRADE  LEVEL 
EQUIVALENTS  OF  MODERATE  AND  HIGH  READING  LEVEL  SUBJECT  GROUPS 
IN  EXPERIMENT  n. 


Summary  of  Analysis  of  Variance,  Experiment  II 
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FIGURE  21  MEAN  CLOZE  SCORE  FOR  HIGH  AND  LOW  READING 
ABILITY  SUBJECTS  AT  THREE  LEVELS  OF  NMI 


Interactions 

Only  the  two-way  inte factions  for  tin  Structun  -of-lntelleci 
variables  are  discussed.  The  statistically  significant  ('Ml  and  1 SI 
interaction  is  plotted  in  Figure  22.  At  medium  to  high  levels  of 
C Ml  , increasing  1 SI  appears  to  have  raised  con  prehension.  At 
the  low  to  medium  level  of  (Ml  , modificat  ion  of  i SI  producer!  ir- 
regular changes  in  clone  score.  It  appears  that  removal  of  abbrevi- 
ations (raising  1 SI)  improves  comprehension,  utili  ss  extreme  vo- 
cabulary diversity  (low  ( Ml  ) is  present. 

The  statistically  significant  interaction  between  (Ml  and 
X MI  is  plotted  in  Figure  2d.  Both  variables,  although  ver\  dif- 
ferent in  method  of  measurement,  were  based  on  vocabulary  ori- 
ented measures.  I'hoy  may  be  somewhat  related  because  the  short, 
common  words  which  arc  most  frequent  in  language,  and  which 
raise  (All  , arc  also  the  words  of  multiple  parts  of  speech,  lower- 
ing X Ml.  This  would  seem  to  make  the  two  additive  (noninteractive). 
However  the  additive  trend  was  not  fully  indicated.  Specifically, 
the  high  X Mi-medium  CMC  point  w as  considerably  below  the  ex- 
pected level  and  the  low  CMC  curve  was  almost  horizontal.  Ap- 
parently, lowering  the  number  of  parts  of  speech  was  effective 
when  the  number  of  different  words  was  high  but  not  when  the  num- 
ber of  different  words  was  low.  '[’his  seems  entirely  logical  • 

I he  interaction  of  XXII  and  J-.SI  was  statistically  significant 
at  the  .01  level  of  confidence.  The  plot  of  this  interaction  is  pre- 
sented in  Figure  24  and,  as  anticipated,  the  plot  indicates  that  the 
interaction  was  relatively  weak.  Generally,  a linear  trend  is  in- 
dicated by  these  data  with  a strong  indication  that  medium  XM1 
was  more  affected  by  FSI  than  either  the  high  or  the  low  X MI  lev- 
els. 
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•FIGURE  22  INTERACTION  OF  CMU  AND  ESI 
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FIGURE  24  INTERACTION  OF  ESI  AND  NMI 


Discussion 


The  major  purpose  of  Experiment  II  was  to  investigate  the 
possibility  of  interactive  effects  among  three  selected  SI  oriented 
variables  purported  to  measure  textual  comprehensibility.  Such 
interactivity  was  evidenced  and  was  interpreted  as  indicating  a 
complex  relationship  among  the  variables  in  the  real  life  situa- 
tion. Such  complex  effects  are  not  considered  to  be  unusual  nor 
were  they  unanticipated.  Moreover,  at  least  for  the*  comprehen- 
sibility measures  employed,  the  two  and  three  wav  interactions 
seemed  reasonable  and  interpretable.  However,  the  presence  of 
such  interactions  may  have  implications  relative  to  the  rewriting 
of  text  after  a comprehensibility  score  has  been  determined. 

Simple  adjustment  of  one  variable  may  or  may  not  produce  the 
desired  result.  This  would  indicate  that  such  textual  adjustments 
must  be  made  rationally  and  that  the  total  text  must  be  reevaluated 
before  one  can  be  certain  that  the  desired  result  has  been  obtained. 

The  failure  ot  Experiment  II  to  indicate  statistically  signi- 
ficant findings  for  all  main  effects,  along  with  some  main  effect 
irregularity  is  not  considered  to  be  detrimental  to  our  genera  1 
posture  relative  to  the  SI  variables.  The  textual  samples  employ- 
ed in  Experiment  II  were  quite  brief.  While  such  small  samples 
were  sufficient  for  demonstrating  interactive  effects,  longer  sam- 
ples are  evidently  required  for  main  effect  demonstration.  I 'or  ex- 
ample, with  a 135  word  passage  and  with  every  tenth  word  deleted, 
each  variable's  cloze  score  was  based  on  13  to  14  fill-ins  (135/10 
13.  5). 

Additionally,  the  textual  stimulus  materials  were  set  at  a 
KG L equal  to  11.  5.  This  value  is  above  the  KG  1 . of  the  best  mod- 
erate reading  level  subjects  (Figure  111)  and  below  the  KG I.  of  the 
poorest  high  reading  level  subjects.  It  is  possible  that  the  11.  5 
KGI.  of  the  textual  materials  was  too  difficult  generally  for  the 
low  KGL  group  and  too  easy  for  the  high  KGI.  group.  This  may 
have  tended  to  mask  main  effect  sensitivity. 
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Experiment  III 


Experiment  III  paralleled  Experiment  II  in  nature  and  scope, 
but  sought  to  obtain  evidence  of  interaction  effects  among  psyeho- 
linguistically  oriented  textual  measures.  The  criteria  for  selecting 
the  included  psycholinguistically  oriented  variables,  experimental 
design,  and  data  collection  procedure  were  nearly  identical  to  those 
of  Experiment  II.  Only  differences  from  the  procedures  of  Experi- 
ment II  will  be  described  here. 

The  variables  chosen  for  investigation  in  Experiment  III  were 
selected  for  minimal  redundancy,  breadth  of  normative  range,  and 
inclusion  in  the  automated  textual  analytic  computer  program.  The 
selected  variables  were;  YD  (an  overall  measure  of  structural  com- 
plexity), MD  (a  vocabulary  oriented  measure),  and  LB  (which  is 
based  on  the  existence  of  particular  characteristics  of  sentences  as 
parsed  for  determination  of  Yngve  depth).  A comparison  of  the  psy- 
cholinguistically oriented  variables  is  presented  as  Table  23. 

Table  2 3 

Comparison  of  Psycholinguistie  Variables  on 


Experiment 

III  Variable  Selection  Criteria 

Variable 

Normative 

Range 

Comments 

Chosen 
Variafc les 

YD 

.508  - .676 

lv<  ■ ill  enten  >mj  L<  xil  y 

X 

MD 

.568  - .742 

Vocabulary  measure 

X 

TC 

.917  - .998 

Portion  i sentence  complex!  • f 

SE 

.123  - 1.0  0 

■ ■ ' m oi  sentence  complexity 

LB 

0.000  - 1.000 

rtion  : enten  m]  < ix  i t y 

X 

RB 

.220  - .405 

N • Lgnil  : tnt . I xj  erimenl  I 

Occurs  rarely 
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1.000 


vided  the  basis  for  the  test  passages  prepared  for  Experiment  III. 

Preparation  of  Stimulus  Materials 

The  passages  were  randomly  assigned  to  experimental  lev- 
els and  modified  as  required  The  policies  followed  for  ensuring 
preparation  of  valid  passages  in  Experiment  II  were  also  followed 
in  Experiment  III.  Finished  samples  fell  between  grade  levels 
11.25  and  11.  75,  as  measured  by  the  Automated  Readability  Index, 
as  was  the  case  in  Experiment  II. 

The  cloze  test  forms  were  prepared  as  in  Experiment  II. 

The  order  of  test  forms  within  packets  was  individually  random- 
ized. The  procedures  of  Experiment  II  were  again  followed. 

Test  Pacing 

The  time  allowance  for  Experiment  II  passage  completion 
was  found  to  be  unnecessarily  long.  In  Experiment  III,  five  minutes 
were  allowed  for  each  passage.  The  subjects  were  warned  when  one 
minute  remained  in  the  time  allotted  for  each  passage.  Again,  all 
subjects  completed  all  passages  within  the  time  allowance  and  the 
randomization  procedures  were  assumed  to  distribute  fatigue  effects 
equally  over  the  data. 

Subjects 

Eight  college  students  (high  reading  ability)  and  eight  students 
in  a technical  job  training  program  (moderate  ability  students)  served 
as  subjects.  All  subjects  were  paid,  as  before. 


Procedure 


The  design  of  the  study  paralleled  that  of  Experiment  II.  Data 
, Election  required  three  hours.  The  high  reading  ability  students 
*,  r*  tested  in  twq  separate  sessions  of  one  and  one-half  hours  each 
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on  successive  days.  The  moderate  reading  ability  subjects  were 
tested  in  a single  session.  A ten  minute  break  was  taken  every 
hour. 


i. 


As  before,  the  data  collection  sessions  began  with  adminis- 
tration of  the  \elson-Denny  Heading  Test,  Form  A (Brown,  1 ? *6 0 ). 

Beading  Level  Results 

1’he  reading  level  of  members  of  the  subject  groups  in  the 
current  experiment  are  presented  in  Figure  25.  As  in  Experi- 
ment II,  the  groups  were  well  separated  In  reading  level.  The 
mean  raw  score  of  the  moderate  reading  level  group  was  31.  0. 

This  value  is  equivalent  to  grade  level  7.  9.  The  mean  score  of 
the  high  ability  group  was  107.0.  This  value  is  above  the  14th 
grade  level.  The  difference  between  the  groups  was  statistically 
significant  below  the  . 001  level  of  confidence  (t  = 7.  85,  df  - 14). 

Cloze  Test  Scoring 

The  cloze  test  scoring  was  completed  in  the  same  manner 
as  for  Experiment  II. 

Bata  Treatment 

As  before,  variance  analysis  constituted  the  principal  data 
analytic  tool.  The  results  of  the  analysis  of  the  variance  of  the 
data  are  presented  in  Table  24. 

Main  Effects 

For  the  main  effects  (YU,  MU,  and  LB),  only  the  effects 
of  I.B  exerted  a statistically  significant  influence.  The  main  ef- 
fect mean  data  are  presented  in  Figures  26,  27,  and  20.  Fig- 
ure 28  indicates  that  the  LB  result  was  due  to  variation  of  cloze 
scores  of  high  ability  readers  as  the  level  of  I.B  was  varied.  I here 
was  little,  if  any,  variation  across  levels  of  other  variables  for 
either  reading  ability  group. 
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] MODERATE  READING  LEVEL 


EQUIVALENT 

FIGURE  25  DISTRIBUTION  OF  NELSON  DENNY  RAW  SCORES  AND  GRADE  LEVEL 
EQUIVALENTS  OF  MODERATE  AND  HIGH  READING  LEVEL 
SUBJECT  GROUPS  IN  EXPERIMENT  III 
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FIGURE  26.  MEAN  CLOZE  SCORE  FOR  HIGH  AND  LOW  RGL 
SUBJECTS  AT  THREE  LEVELS  OF  YNGVE  DEPTH 
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FIGURE  28.  MEAN  CLOZE  SCORE  FOR  HIGH  AND  LOW  RGL  SUBJECTS 
AT  THREE  LEVELS  OF  LEFT  BRANCHING 


Interactions 


As  for  Experiment  11,  only  two-way  interactions  between 
comprehensibility  variables  are  discussed. 

In  the  statistically  significant  interaction  of  Yl)  with  LB 
(Figure  29),  the  effect  of  variation  in  I B with  A I)  held  at  a high 
level  was  quite  different  from  tin  -an  t variation  with  A It  held  at 
low  or  moderate  level.  It  should  be  noted  that  these  measures  an 
related  in  that  LB  level  may  be  considered  to  be  a contributor  to 
Yl);  i.  e.  , elevation  or  depression  of  LB  w ill  tend  to  move  the  Yl) 
value  in  the  same  direction. 

The  lowest  mean  cloze  score  at  the  low  level  of  I.B  was 
that  of  high  Yl).  Elsewhere,  cloze  scores  associated  w ith  high 
YD  were  considerably  above  scorers  at  low  and  medium  AD.  The 
reversal  at  low  l.B  may  be  due  to  the  relatedness  of  the  varia- 
bles, i.  e.  , high  Yl)  and  low  LB  are  relative  opposites.  Produc- 
ing this  combination  of  values  required  that  the  writing  be  some- 
what unusual  in  style.  This  characteristic  may  have  negatively 
affected  comprehensibility. 

The  MD  by  LB  interaction  was  also  statistically  signifi- 
cant at  the  .01  level  of  confidence.  This  result  was  clearly  due 
to  the  difference  in  effect  on  comprehension  as  MD  was  varied 
with  LB  held  at  the  medium  level,  compared  to  the  low  l.B  and 
high  LB  curves.  This  interaction  is  presented  as  Figure  10. 

In  the  MD  subexperiment  of  Experiment  I,  cloze  peaked 
as  a function  of  AID  at  the  medium  value  of  All).  The  same  ef- 
fect was  apparent  when  LB  was  hold  at  low  or  high  levels.  MD 
may  possess  the  originally  anticipated  effect  onl\  when  other 
variables  are  held  at  the  medium  level. 

The  interaction  between  YD  and  MD  was  not  stat istieallv 
significant. 
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FIGURE  29  INTERACTION  OF  YD  AND  LB 
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FIGURE  30,  INTERACTION  OF  MD  AND  LB 


Discussion 


Statistically  significant  two-way  interactions  were  evidenced 
for  the  psycholinguistically  oriented  variables,  as  for  the  SI  oriented 
comprehensibility  variables  of  Experiment  II.  As  for  Experiment  I, 
these  findings  suggested  that  a simple  additive  conceptualization  of 
these  measures  in  combination  was  not  tenable.  Such  findings  are 
not  unique  to  comprehensibility  measurement.  Fields  such  as  per- 
sonality and  individual  ability  differences  must  also  contend  with 
such  interactive  effects.  Such  effects  suggest  that  the  world  of 
textual  comprehensibility  measurement  is  not  as  simple  or  orderly 
as  one  would  like  it  to  bo. 

The  interactive  effects  noted  seem  logical  and  consistent 
with  other  data  separately  collected  during  the  course  of  the  pres- 
ent work. 

Experiment  I indicated  that  YD  and  \ID  reliably  influenced 
comprehensibility.  This  result  was  not  confirmed  in  Experiment  III. 
As  in  the  case  of  Experiment  II,  it  is  believed  that  the  passages  em- 
ployed were  not  sufficiently  long  to  allow  reliable  assessment  of 
main  effects  and/or  that  the  RGL  level  of  the  materials  caused  a 
masking  effect. 
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Experiment  IV 


A fourth  experiment  was  performed  in  order  to  further 
investigate  one  of  the  variables  which  had  not  demonstrated 
a consistent  effect  across  experiments.  Since  passage 
length  was  posited  as  a possible  explanatory  basis  for  the 
inconsistency,  longer  passage  lengths  were  employed  :n 
Experiment  IV  than  in  Experiments  II  and  III,  and  several 
other  aspects  were  varied.  CMU  was  selected  for  con  i i< i }- 
tion  in  Experiment  IV.  The  CMU  variable  had  demonstrated 
a statistically  significant  effect  in  Experiment  I but  not 
in  Experiment  II. 

The  CMU  factor  is  defined  as  1 - [ NDW ( B ) /TNW ( B ) ] , where 
NDW(B)  represents  the  number  of  different  words  appearing  in 
a textual  block  and  TNW(B)  represents  the  total  number  of 
words  in  the  block.  The  CMU  measure  is  based  on  examination 
of  text  blocks  of  100  words.  For  longer  passages,  successive 
blocks  of  100  words  are  measured,  and  the  individual  values 
are  averaged. 


Sub  j ects 

The  subjects  were  eight  college  undergraduates  enrolled 
in  an  introductory  psychology  course.  The  Nelson-Denny 
Reading  Test  (Brown,  1960)  was  employed  to  classify  the 
students  into  two  reading  ability  levels.  One  group  of 
four  students  had  an  average  score  at  the  87th  percentile 
for  grade  10  and  the  other  group  of  four  had  an  average 
score  at  the  86th  percentile  for  grade  16.  Accordingly, 
one  group  of  subjects  was  viewed  as  high  school  level 
readers,  and  the  other  group  was  viewed  as  college  level 
readers . 


Reading  Materials 

The  basic  stimuli  consisted  of  four  passages  of  popular  liter- 
ature. The  lengths  ranged  between  458  and  616  words.  For  con- 
venience, they  were  designated  A2,  A3,  B2,  and  B 3 .'  Paragraph  A, 
a story  about  cold  cures  taken  from  Headers  Digest,  was  written  at 
two  extreme  levels  of  word  redundancy  (A2,  A3).  Paragraph  B,  a 
story  about  the  early  life  of  Will  Rogers,  also  taken  from  Readers 
Digest,  was  prepared  to  reflect  the  same  two  extreme  levels  of 
type/ token  ratio  (B2,  B3).  Collectively,  the  passages  define  two 
experimental  conditions  as  high  complexity  (A3,  B3)  and  low  com- 
plexity (A2,  B2).  In  order  to  control  the  possible  confounding  of 
the  difficulty  of  the  passages  with  the  CMU  measure,  all  passages 
were  brought  to  a common  RGL  (6.  5),  as  measured  by  the  Auto- 
mated Readability  Index  (Smith  & Senter,  1966).  A description  of 
the  reading  materials  is  presented  as  Table  25. 


Table  25 

Description  of  the  Passages  Employed  as 
Experimental  Stimuli  in  Experiment  IV 


Version  of  Passage C MU 


Will  Rogers 

Low  Complexity  (A9  ) . 52 

High  Complexity  (A  ) .05 

Cold  Cures 

Low  Complexity  (B(>)  . 51 

High  Complexity  (B“)  .04 


ript  1 wa  used  to  identify  the  original  passage. 
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Cloze  Form  Preparation 


Cloze  forms  were  developed  from  the  rewritten  textual  pas- 
sages in  the  same  manner  as  for  Experiments  I,  II,  and  III. 

Design  and  Procedure 


The  study  may  be  viewed  as  a two  factor  mixed  design  with 
repeated  measures  over  two  paragraph  complexity  levels,  and  in- 
dependent subjects  groups  to  represent  two  reading  ability  levels. 
All  subjects  were  tested  in  a college  classroom  during  a regularly 
scheduled  introductory  psychology  class  by  their  regular  instructor 
During  the  first  portion  of  the  period,  each  subject  was  given  the 
Nelson-Denny  Reading  Test  (Brown,  1960)  to  estimate  RGL.  Sec- 
ond, each  student  was  exposed  to  one  of  the  passages  with  instruc- 
tions to  fill  in  every  deleted  word.  While  no  time  limit  was  given, 
the  subjects  were  told  that  both  speed  and  accuracy  would  be  scored 
The  procedure  for  the  cloze  data  collection  for  the  second  passage 
was  the  same  as  for  the  first  passage,  and  second  passage  comple- 
tion followed  the  first  immediately  in  time. 

Within  the  low  reading  ability  group,  one  of  the  subjects 
was  exposed  to  the  low  complexity  version  of  passage  A and  then 
to  the  high  complexity  version  of  passage  B.  Another  one  of  the 
subjects  at  the  low  reading  level  received  the  high  complexity 
version  ot  passage  A first  and,  afterwards,  the  low  complexity- 
version  of  passage  B.  The  remaining  two  subjects  at  the  low  read- 
ing level  had  a counterbalanced  version  of  the  above.  Accordingly, 
the  passage  sequences  were  presented  to  separate  individuals  of 
both  high  and  low  reading  ability  to  achieve  a completely  counter- 
balanced design.  Table  26  describes  the  design  of  the  experiment. 

Scoring 


Cloze  scores  were  derived  employing  the  same  criteria  as 
for  Experiments  I,  II,  and  III.  Completion  time  data  were  also 
collected. 
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UNCLASSIFIED 


Table  2 6 


Design  for  Experiment  IV 


Passage  Sequence 


Low  Reading 

Ability 

Subjects 


High  Reading 

Ability 

Subjects 


Results 

Two  analyses  involving  the  separate  dependent  variables 
were  completed.  The  first  was  based  on  the  percentage  of  cor- 
rectly identified  deleted  words.  The  second  means  analysis  was 
based  on  the  total  time  taken  by  the  subjects  to  complete  the  task. 

Tables  27  and  28  represent  the  summaries  of  these  analyses. 
Graphic  presentations  of  the  results  appear  as  Figures  31  and 
32.  In  Experiment  IV,  statistically  significant  differences  were 
noted  between  cloze  scores  for  the  high  and  the  low  CMU  materi- 
als. 
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Table  27 


Summary  of  Cloze  Score  Analysis  of  Variance 
for  Experiment  IV 


Source 

SS 

df 

MS 

F 

Between  Groups  , 

1470 

7 

A (Reading  Level) 

138 

1 

138.  0 

n.  s. 

Subjects  Within  Groups 

1332 

■6 

222.  0 

Within  Subjects 

3791 

7 

B (Passage  Level) 

3452 

1 

3452.  0 

55.3?’ 

AB 

27 

1 

27.  0 

n.  s. 

B x Subjects  Within  Groups 

312 

5 

62.  4 

Total  52  61  15 


**  p < . 01 


Table  2 8 


Summary  of  Working  Time  Analysis  of  Variance 
for  Experiment  IV 


Source 

SS 

df 

MS 

F 

Between  Groups 

55 

7 

A (Reading  Level) 

30 

1 

30 

7.  1 

Subjects  Within  Groups 

25 

6 

4.  2 

Within  Subjects 

54 

7 

B (Passage  Level) 

6 

1 

6 

n.  s, 

AB 

2 

1 

2 

n.  s 

B x Subjects  Within  Groups 

46 

_5_ 

9.  2 

Total 

109 

15 

95 


* p < . 05 
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FIGURE  32.  WORKING  TIME  REQUIRED  BY  TWO  READING  ABILITY 
GROUPS  AS  A FUNCTION  OF  CMU. 
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RGL  of  the  subjects  did  not  influence  their  reading  com- 
prehension as  measured  by  the  cloze  test  (Table  2 7);  however, 
the  average  time  required  to  complete  the  passages  differed  sig- 
nificantly between  the  low  and  the  high  reading  level  groups  (Ta- 
ble 28).  Whereas  the  high  ('Ml  passages  resulted  in  an  average 
of  58  percent  correctly  filled  in  words,  the  low  C'Ml  passages 
resulted  in  only  28  percent  correctly  filled  in  words.  Subject  dif- 
ferences in  reading  ability  appeared  when  using  the  time  measure 
of  reading  performance.  Here,  the  high  ability  readers  averaged 
only  7.  8 minutes  per  passage,  whereas  the  low  ability  readers 
averated  10.  6 minutes  per  passage.  It:  is  interesting  to  note  that 
the  high  ability  readers  showed  their  superiority  in  speed  but  not 
in  comprehension. 

Di  scussion 

With  some  reservations,  these  results  are  taken  to  indicate 
that  nil'  constituted  a variable  which  influences  the  comprehensi- 
bility of  written  passages.  Two  salient  differences  among  Experi- 
ments 1,  II,  and  IV  are  the  passage  lengths  and  the  levels  of  CMU 
employed.  Experiments  I and  IV,  in  which  statistically  significant 
CME  effects  were  noted,  utilized  textual  blocks  in  excess  of  300 
words,  whereas  Experiment  II  used  135  word  blocks.  Thus,  it 
could  be  conjectured  that  insufficient  passage  length  was  a primary- 
reason  for  the  results  of  Experiment  II.  However,  there  is  also 
the  possibility  that  the  extreme  levels  of  ('All  in  Experiment  IV 
account  for  the  significant  differences  between  cloze  scores  on 
the  two  passages. 


The  purpose  of  the  experimental  portion  of  the  present  study 
was  to  verify  and  clarify  the  effects  of  a set  of  SI  and  of  a set  of  psy- 
cholinguist cally  oriented  variables  on  the  comprehensibility  of  Air 
Force  technical  training  and  related  materials. 

Four  experiments  were  performed  relative  to  the  first  goal. 
The  results  of  the  first  experiment  were  in  substantial  agreement 
with  prior  work,  performed  under  Air  Force  sponsorship,  which 
indicated  that  variation  of  both  the  SI  and  psycholinguistically  ori- 
ented variables  does,  in  fact,  affect  the  comprehensibility  of  text. 
The  second  and  third  experiments  indicated  that  the  SI  oriented 
variables  exerted  an  interactive  effect  on  one  another,  as  did  the 
psycholinguistically  oriented  measures.  Accordingly,  the  results 
of  these  two  experiments  suggest  that  comprehensibility  cannot  be 
considered  to  be  a simple,  additive  cognitive  attribute.  The  fourth 
experiment  sought  to  verify  the  reason  that  certain  main  effects, 
previously  found  to  exert  statistically  significant  effects  on  textual 
comprehensibility,  did  not  produce  the  anticipated  effects  in  the 
second/ third  experiments.  With  some  reservations,  due  tc  the 
nature  of  the  experimental  materials  used,  the  results  of  the 
fourth  experiment  might  be  viewed  as  supporting  the  contention 
that  the  length  of  the  textual  materials  employed  in  the  second/ 
third  experiments  was  insufficient.  Accordingly,  when  the  pres- 
ent set  of  results  is  viewed  in  association  with  prior  studies 
(Siegel  & Burkett,  1974)  investigating  the  same  variables,  there 
is  a growing  body  of  evidence  supporting  the  potential  of  the  SI 
and  of  the  psycholinguistically  oriented  variables  as  measures 
of  textual  comprehensibility.  In  this  regard,  we  also  point  to 
the  high  multiple  correlations  developed  by  Williams,  Siegel, 
Burkett,  and  Groff  (in  press)  relative  to  the  power  of  the  pres- 
ent set  of  variables  as  predictions  of  cloze  scores  (overall  r 
.60;  high  reading  ability  group  r .73;  low  reading  ability  group 
r = .46). 

While  the  mechanism  through  which  these  and  related  vari- 
ables affect  comprehensibility  remains  to  be  posited,  this  mechan- 
ism was  conjectured  at  the  outset  (Siegel  & Burkett,  1974)  as  an 
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intervening  variable  called  mental  load.  A relationship  was  con- 
jectured between  the  level  of  these  variables  in  a text  and  the  in- 
tellective load  that  the  text  places  on  the  reader.  Such  a relation- 
ship  would  help  to  explain  why  a calculus  text,  although  written  in 
small  words,  might  be  difficult  for  a reader  or  why  an  electronics 
manual,  although  containing  somewhat  large  words,  might  be  highly 
comprehensible  to  an  electronics  technician.  Such  an  intervening 
variable  might  also  help  to  explain  certain  of  the  interactive  effects 
noted.  If,  for  example,  differential  intellective  loads  are  imposed  by 
the  variables,  wre  would  not  anticipate  a direct,  linear  stimulus-re- 
sponse relationship  between  the  variables  in  combination  and  com- 
prehensibility. Moreover,  context  may  differentially  affect  the  var- 
iables. For  example,  a direct  relationship  may  exist  between  CMU 
in  an  electronics  manual  and  intellective  load  when  the  reader  is  a 
layman  and  an  inverse  relationship  may  exist  for  an  experienced 
electronic  technician. 

The  present  measures  possess  certain  other  advantages. 

They  were  based  on  constructs  which  are  believed  to  be  meaning- 
ful and  as  such,  they  gain  added  support.  Additionally,  they  are 
amenable  to  objective  derivation  through  formalized  counting  pro- 
cedures. Such  objectivity  removes  user  bias  and  judgment  from 
the  evaluation  of  the  comprehensibility  of  a text.  The  counting  pro- 
cedures make  it  possible  to  implement  the  measures  through  digi- 
tal computer  techniques. 

Thus,  the  intellective  load  construct  seems  to  continue  to 
provide  a necessary  unifying  construct  for  textual  comprehensibil- 
ity analysis.  Little  was  found  in  the  present  set  of  results  to  negate 
confidence  in  this  construct.  Accordingly,  we  continue  to  support 
the  construct  for  textual  comprehensibility  measurement  purposes. 

The  interactive  effects  noted  within  Experiments  II  and  III 
are  customary  in  this  type  of  work.  While  a simple,  additive  set 
of  comprehensibility  variables  would  represent  a desirable  goal, 
such  a goal  is  probably  not  realistic.  It  seems  more  rational  to 
support  the  use  of  current  variables  and  supportable  measures  (as 
discussed  in  this  report)  while  the  search  for  new  and  better  vari- 
ables continues. 
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We  note  that  the  present  measures  are  not  normed  to  RGL. 

The  RGL  construct  has  not  proven  itself,  in  our  opinion,  to  be  en- 
tirely fruitful  or  meaningful.  For  example,  it  is  not  likely  that  an 
intelligent  adult  reading  at  the  tenth  RGL  derives  the  same  set  of 
perceptions  from  a given  text  as  a ten-year-old  reading  at  the  same 
level.  Rather,  we  have  normed  our  measures  to  other  reading  ma- 
terials. It  seems  more  reasonable  to  compare  a given  text  to  a per- 
centile value  for  other  similar  texts  than  to  RGL. 

The  present  variables  say  nothing  about  format  or  media. 
These  are  considered  to  be  more  related  to  the  utility  of  certain 
types  of  written  materials  (e.  g.  , operational  manuals)  than  to  com- 
prehensibility. Alternatively,  format  problems  which  are  solved 
by  typographers,  table  setters,  headings,  and  the  like  can  also  be 
held  to  lighten  the  intellective  load  on  the  reader  and  accordingly 
aid  comprehension. 

One  of  the  needs  of  the  technical  writer  is  for  an  objective 
evaluative  device  which  will  tell  him  not  only  that  a text  is  at  a giv- 
en level  of  comprehensibility  but  also  what  he  can  do  to  increase 
the  comprehensibility  of  the  text.  The  present  method  is  believed 
to  possess  such  diagnostic  value;  however,  there  has  been  no  eval- 
uation, to  date,  of  the  utility  of  the  method  in  this  regard.  In  a 
similar  vein,  technical  course  directors  and  managers  currently 
attempt  to  maintain  quality  controls  and  standards  for  training  and 
other  written  materials.  The  CM  program,  in  conjunction  with  the 
associated  norms,  can  be  employed  to  establish  and  maintain  mini- 
mally acceptable  and  desirable  comprehensibility  standards. 

With  respect  to  the  question  of  whether  or  not  the  present 
measures  are  applicative  to  paper-and-pencil  testing  materials, 
little  is  known.  The  present  set  of  measures  is  dependent  on  textu- 
al block  sizes  of  about  300  to  500  words.  Available  readability  meas- 
ures depend  on  an  adequate  sample  of  text.  Yet,  some  seem  (on  the 
surface)  to  be  less  dependent  on  block  size  than  others.  Accordingly, 
there  is  reason  to  believe  that  some  selected  subset  of  the  present 
set  of  measures  may  be  employable  for  paper-and-pencil  test  com- 
prehensibility evaluative  purposes. 
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On  the  basis  of  the  indications  of  the  data,  the  following  con- 
clusions seem  tenable: 

1.  The  various.  Structure-of-Intellect  and  the  psy- 
cholinguistically  oriented  measures,  described 
and  investigated,  possess  potential  for  measur- 
ing and  providing  a basis  for  evaluating  the  com- 
prehensibility of  textual  materials. 

2.  The  various  measures  are  interactive  in  nature 
and  the  full  extent  of  these  interactions  remains 
unknown. 
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V.  THE  COMPREHENSIBILITY  MEASURES  (CM)  PROGRAM 


Chapter  V describes  the  structure  and  major  logic  features 
for  a comprehensibility  measures  (CM)  computer  program  which 
could  be  developed  for  performing  the  various  textual  comprehensi- 
bility measurements  discussed  throughout  this  report. 

The  global  flow-sequence  chart  for  the  CM  program  is  pre- 
sented as  Figure  33.  At  the  outset,  we  note  that  the  availability  of 
at  least  one  dictionary  for  use  by  the  CM  program  is  assumed.  Also 
assumed  is  a version  of  the  text  to  be  processed  in  machine  readable 
form.  These  are  shown  in  Figure  33  interfacing  with  the  calculation 
module  since,  for  the  normal  run,  it  is  expected  they  would  be  as- 
sessed by  the  computer  via  its  bulk  storage. 

For  input,  the  CM  program  will  accept  run  specifications  in 
a very  flexible  author-compatible  syntax.  A variety  of  parameters 
may  be  entered  to  determine  such  items  as  the  length  of  the  text  to 
be  measured,  the  nature  and  scope  of  the  run,  and  the  quantity  and 
form  of  output.  These  may  be  entered  using  a common  syntax  either 
via  interactive  terminal  (local  or  remote)  or  for  batch  processing 
(local  input  or  remote  batch  terminal)  depending  on  the  equipment 
configuration  of  the  computer  system  for  which  the  CM  program  may 
be  adapted. 

Using  these  input  data  specifications  and  the  dictionary  and 
text  to  be  processed,  the  CM  program  will  perform  a run  whose 
purpose  is  either  a dictionary  check  of  selected  text  (a  CHECK  run) 
or  the  calculation  of  comprehensibility  and  other  measures  (a  MEAS- 
URE run).  Regardless  of  type  of  run,  various  program  modules 
which  comprise  the  CM  program  will  be  utilized.  These  modules 
are  shown  in  Figure  33  grouped  into  three  categories;  operating 
modules  (those  dealing  with  input,  program  initialization  and  oper- 
ator interface),  semantic  modules  (those  dealing  with  calculating 
the  various  comprehensibility  and  reading  grade  level  measures), 
and  result  modules  (those  involved  with  summarizing  and  display- 
ing results  at  various  levels  of  detail).  The  modules  in  Figure  33 
represent  a summary  identification;  the  actual  name  and  functions 
of  each  of  the  20  program  modules  is  given  in  Table  20. 
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FIGURE  33  OVERVIEW  OF  THE  COMPREHENSIBILITY  MEASURES  PROGRAM 


Table  2 9 


Operational  Program  Modules  in  Comprehensibility  Measures  Program 


NODULE 

IDENTIFICATION  MODULE  FUNCTION (S) 


1 INITIAL 

2 READ 

3 SCAM 

4 SCANINPUT 

5 ERROR 

6 RESET 

7 SEARCH 

8 COUNT 

9 PARSE 

10  MEASURE/SI 

11  MEASURE/P 

12  SENTSUM 

13  SENTOUT 

14  RGL 

15  RGLOUT 

16  BLOCKSUM 

17  CHECKOUT 

18  MEASUREOUT 

19  RUNSUM 

20  RUNOUT 


Perform  initializations  required  for  start  of  run 
Control  read  and  check  of  run  request  input  data.  Set  up 
run 

Read  and  scan  text  for  block 

Directly  access  user  input  token/record 

Report  syntax  error  in  users  input 

Perform  setup  (reset)  for  processing  of  sentence 

Check  dictionary  for  word.  Extract  information  or 

request  input 

Maintain  running  counts  for  sentence  and  block  summaries 
Parse  sentence  and  determine  no.  of  possible  parses 
Calculate  Structure-of-Intellect  measures 
Calculate  psycholinguistic  measures 
Cumulate /summarize  results  of  sentence 
List/display  sentence  results 
Calculate  mechanical  reading  grade  levels 
List/display  RGL  measures 

Summarize  and  normalize  block  results  and  maintain  over 
the  block 

List/display  block  results  of  dictionary  check 
List/display  block  results  of  measures  and  regression 
calculations 

Summarize  results  over  all  block  for  run  report 
List/display  run  results 


Output  results  will  be  printed  or  displayed  in  the  selected 
level  of  detail  and  at  the  terminal  location  specified  by  the  user  in- 
put. Output  includes  the  comprehensibility  measures,  words  not 
found  in  the  dictionary,  and  a variety  of  summarized  statistical 
and  parsing  results  of  processing. 

A series  of  utility  and  related  support  routines  are  also  iden- 
tified in  Figure  32.  These  are  not  specified  in  detail  in  this  report 
as  they  do  not  come  within  the  scope  of  the  CM  specifications  direct- 
ly. They  are  mentioned  here  only  as  a checklist  reminder  that  their 
availability  would  be  very  helpful  in  an  operational  environment  in 
which  the  CM  program  is  expected  to  be  used.  Elaboration  on  the 
functions  of  the  utility  subroutines  is  given  in  Table  30. 
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Table  30 


Utility  and  Support  Program  Modules  Relating 
to  the  Comprehensibility  Measures  Program 


Model 

Identification 


101  UPDATE/D 

102  LIST/D 

103  TEACH 

104  CMSTAT 

105  GRAPHPARSE 


Module  Function (s) 


Add,  delete,  modify  dictionary  contents;  calculate 
statistical  summaries 

List/display  dictionary  contents  in  a number  of  ways 
List/di  splay  user  instructions  on  CM  program  runs 
Maintain  statistics  on  CM  runs  (time,  words,  measure 
sums) 

Print  a graphical  representation  of  the  structure  of 
selected  sentences  on  the  line  printer 


Also  associated  with  the  use  of  the  CM  program,  but  not  part 
of  it  are  programs  likely  to  be  already  available  in. a computer  facil- 
ity which  supports  text  editing,  word  processing,  and  similar  func- 
tions. Examples  of  these  programs  and  their  functions  are: 


Text  entry--generate  text  files  in  the  format  re- 
quired by  CM  for  comprehensibility  measurement 

Text  editing  or  update- -modify  and/or  correct  a 
file  containing  the  text  to  be  measured 

Test  listing/ displ ay--record  the  current  revision 
of  a text  file  for  author/ editor  review  and/or  publi- 
cation with  or  without  line  numbers 

Figure  34  shows  the  global  logic  of  the  CM  program  in  a 
somewhat  more  detailed  form.  Each  box,  or  group  of  boxes  in  the 
chart,  presents  the  program  module  name,  number,  and  function, 
and  the  interrelationships  among  modules.  Prior  to  discussing 
this  chart,  the  concepts  of  a block  of  text  and  of  a computer  run 
are  elaborated. 


* 


I 


Ate 


RUN 

INITIALIZATION 


INITIAL  (1) 


ERROR 

INPUT 


CONTROL  RUN  REQUEST 
INPUTS  AND  CHECK 


1 

REPORT  ERRORS  TO 
USER 

NEXT  BLOCK 

ERROR  (5) 

FIRST  BLOCK 


CONTROL  TEXT  INPUTS  AND 
SETUP  FOR  BLOCK  B 


SCAN  (3) 


NO  SUCH  BLOCK 


READ  INPUT 
DATA 

RESET  FOR  START  OF 
SENTENCE  PROCESSING 

SCANINPUT  (4) 

RESET  (6) 

FIRST  WORD 


NEXT  WORD 


SEARCH  FOR 
NEXT  TEXT 
WORD  IN 
DICTIONARY 


IS  THIS  WORD 
IN 

DICTIONARY? 


NO-BATCH 

► 

NO 


SET  UP  FOR  REPORT- 
ONLY  (NO  MEASURES) 
THIS  SENTENCE 


FIND  AND  READ 

DICTIONARY 

CONTENTS 


INTEF^ 

ACTIVE 


REQUEST,  ACCEPT  AND 
CHECK  NEW 
DICTIONARY  INPUT 


Sp ARCH 


MAINTAIN  RUNNING  COUNTS 
FOR  SENTENCE  AND 

BLOCK  SUMMARIES 

COUNT  (8) 


hV 

NO 

s'  END  OF x 

CHECK  X 

\ N°( 

SENTENCE 

\ RUN 

Xn£>NLY>/'^ 

FIGURE  34.  COMPREHENSIBILITY  MODEL  GLOBAL  FLOW  LOGIC  (Page  1 of  2) 
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FIGURE  34.  COMPREHENSIBILITY  MODEL  GLOBAL  FLOW  LOGIC  (PAGE  2 OF  21 
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Text  Block 


A "block"  of  text  is  a contiguous  string  of  words,  abbrevi- 
ations, and  numbers  comprising  sentences,  and  whose  starting  and 
ending  points  or  length  are  specified  by  the  CM  user.  A block  must 
start  at  the  beginning  of  a sentence  and  end  at  the  end  of  a sentence. 

The  minimum  block  size  is  arbitrarily  specified  to  be  100  words. 
Accordingly,  the  user  may  specify  text  blocks  over  100  words.  How- 
ever, since  a block  may  not  end  in  the  middle  of  a sentence,  the  CM 
program  (not  the  user)  will  determine  the  actual  number  of  words, 
pei-  block,  and  this  will  vary  from  block  to  block.  For  example,  if 
the  u^jer  specifies  500  word  blocks,  they  will  be  near  500,  but  not 
necessarily  equal  to  500.  The  user,  alternately,  has  the  option  of 
inserting  "block  mark"  codes  into  his  text.  This  will  allow  him  to 
calculate  comprehensibility  measures  by  page  of  text,  by  paragraph, 
chapter,  or  the  like.  However,  placement  of  a block  mark  in  the 
middle  of  a sentence  is  not  permitted. 

Computer  Hun 

A computer  "run"  is  defined  as  a series  of  iterations  through 
the  CM  program.  Each  iteration  calculates  the  comprehensibility 
measures,  etc.  , for  a block  until  the  entire  specified  text  has  been 
so  processed  and  an  end-of-run  summary  of  all  blocks  has  been 
calculated  and  either  listed  or  displayed. 

Equipment  and  Systems  Software  Requirements 

The  CM  program  was  planned  and  designed  to  be  implement- 
ed on  the  CDC  Cyber  73-16  computer  system  at  the  Technical  Train- 
ing Division  of  the  Air  Force  Human  Resources  Laboratory.  Langu- 
ages currently  available  arc  PASCAL  and  FORTRAN.  The  major 
equipment  features  are; 

1.  98,  304,  60-bit  words  central  memory 

2.  10  peripheral  and  control  processors  of  4096,  12-bit  words 

3.  2 CRT  displays  with  keyboard  for  operator  console 

4.  extended  core  storage  of  503,  808,  60-bit  words 

5.  disk  storage  472  million,  6 bit  characters,  30  ms  access 

6.  line  printer  1200  lines  per  minute 

7.  card  reader  at  1200  cards  per  minute 

8.  4 magnetic  tape  units,  9-track,  800  and  1600  bpi, 

80,  000  and  16,  000  8-bit  characters  per  second 
transfer  rate 

9.  Plato  terminal 
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However,  the  CM  program  design  and  specifications  were 
developed  for  any  medium  to  large  scale  digital  computei  system 
which  has  peripheral  capability  for  at  least  the  following: 

1.  a CRT  terminal,  with  keyboard  or  card  reader 

2.  1 magnetic  tape  unit 

3.  1 high  speed  line  printer 

4.  intermediate  access  storage  adequate  for  the 
text  to  be  measured,  the  dictionary  and  the 
operating  system,  discussed  subsequently 

To  be  compatible  with  the  computing  system  available  at  the 
Air  Force  Human  Resources  Laboratory,  the  CM  program  should  be 
implemented  in  the  PASCAL  programming  language. 


Storage  Requirements 

The  central  memory  requirements  (these  estimates  include  a 
20  percent  reserve  factor)  of  the  program  are  estimated  to  be: 

5,  000  words  for  global  data  (COMMON),  files,  and  buffers 
1,  200  words  for  global  code 

9,  600  words  maximum  size  of  any  one  module 

Therefore  the  maximum  central  memory  required  at  any  one  time 
should  be  15,  300  words. 


The  total  estimated  storage  requirement  for  all  modules  of 
the  CM  program  are  expected  to  be  about  50,  000  to  60,  000  words. 
The  following  assumptions  were  made  in  developing  the  above  esti- 
mates; 


1.  There  will  be  10,  6-bit  characters  per  word. 

2.  These  will  be  10  words  for  each  file  as  file 
control  information. 

3.  There  will  be  2 buffers  and  a record  area  for 
each  file. 

4.  The  ability  to  overlay  each  module  will  exist  so 
that  only  one  module  (or  group  of  modules)  need 
be  in  core  at  one  time. 


In  addition,  data  files  (itemized  in  Appendix  K-l)  will 
storage  in  mass  memory. 


requi re 
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Run  Requests 


I’oi'  each  computer  run  of  the  CM  program,  the  user  will 
select  and  enter  a variety  of  run  request  information.  Huns  are  of 
two  types;  (1)  a CHICK  run,  to  search  the  selected  text  and  check 
whether  or  not  the  text  words  appear  Ln  the  specified  dictionary  file, 
or  (2)  a MKASI  KK  run  in  which  the  comprehensibility  and  other 
measures  are  determined.  The  specific  syntax  for  calling  each  of 
these  types  of  runs  is  displayed  in  detail  in  Appendix  C.  A summary 
of  the  type  of  input  required  for  both  types  of  runs  is  presented  in 
Table  Ml  for  reference  purposes.  In  Table  Ml,  an  asterisk  indicates 
those  input  types  for  which  default  values  will  be  supplied  by  the  com- 
puter program  if  not  given  by  the  user,  and  the  default  column  sum- 
marizes the  condition  if  no  input  is  given.  The  only  mandatory  input 
is  that  needed  to  identify  the  text  input  file.  The  large  number  of  op- 
tional inputs  facilitates  versatile  run  requests  by  the  user. 

Dictionary  Kile  Requirements 


A dictionary  file  is  one  of  the  major  input  files  used  by  the 
CM  program.  This  file  will  need  to  be  developed  as  part  of  the  pro- 
gram development  effort  since  no  dictionary  is  known  to  exist  which 
contains  all  of  the  various  required  data.  Table  M2  summarizes  the 
principal  information  relating  to  the  dictionary.  Part  A of  the  'a- 
ble  shows  the  eight  items  of  information  which  are  stored  for  each 
word  in  the  dictionary.  Seven  of  these  are  provided  for  each  dic- 
tionary entry.  The  eighth  is  updated  by  the  C’.M  program  itself. 

Part  B of  Table  M2  presents  header  information  relating  to 
the  specific  dictionary  in  use;  these  items  are  maintained  by  the  CM 
program.  It  is  anticipated  that  various  users  of  the  CM  program 
w ill  find  that  more  than  one  dictionary  w ill  be  required  in  order  to 
accommodate  runs  for  various  texts.  (Of  course,  only  one  diction- 
ary is  required  per  run.  ) The  CM  program  provides  for  selecting  a 
single  dictionary  by  name  on  each  run.  Specialty  dictionaries  may 
result  from  continued  use  of  the  CM  program.  Such  dictionaries 
will  serve  to  reduce  computer  run  time's  since,  in  general,  smaller 
dictionary  sizes  will  result  in  shorter  run  times. 

Other  Piles 


Part  C of  Table  M2  presents  the  two  additional  files  required 
as  CM  program  input.  In  conjunction  with  the  Hun  Request  Syntax, 
a file  of  words  which  play  the  role  of  introducing  explanations  is  re- 
quired, ns  well  as  a file  containing  cliches  which  may  occur  in  the 
text. 
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Table  32 


Dictionary  Requirements 

A.  CONTENTS:  the  following  items  will  be  stored  for  each  word, 


symbol  and  abbreviation  expected  to  be  encountered. 


Source 

I tem 
No. 

I tem 

Data 

Type 

Name 

Range 

1 

Number  of  morphemes 

N 

NOMORE 

1 

to 

9 

2 

Number  of  parts  of  speech 

N 

NOPARTS 

1 

to 

9 

3 

Parts  of  speech 

A 

PART(P) 

1 

to 

9 

INPUT 

4 

Number  of  syllables 

N 

NOSYLLABLES  . 

1 

to 

IS 

DATA 

5 

Negativity  indicator 

A 

NEGIND 

P 

or 

N 

6 

Symbolic  abbreviation  indicator 

A 

SYMBOLIND 

Y 

or 

N 

7 

Number  of  words 

N 

NOWORDS 

1 

to 

S 

CM  Program 

8 

Number  of  references  to  word 

N 

NOREFS 

0 

to 

9999 

B.  HEADER  INFORMATION:  the  following  data  will  be  maintained  by  CM 

for  each  CHECK  or  MEASURE  run . 

Source 

Item 

No. 

I tem 

Data 

Type 

Name 

Range 

INPUT 

1 

File  name  (file  ID) 

A 

FILEID 

15  symbolr. 

INPUT 

2 

Dictionary  specialty  type 

A 

SPECIALTY 

15  symbols 

CM  Program  3 

Last  change  date 

N 

DICTDATE 

XX-XX-XX 

UPDATE/D 

4 

Total  no.  of  parts,  of  speech  - 

Utility 

all  words 

N 

TOTAL  PARTS 

XXXXX 

CM  Program  S 

Total  number  of  references  made 

N 

REFERENCES 

xxxxxx 

to  all  entries 

C.  FILES 

: the  following  files  will  be 

dictionary . 

maintained  in 

support  of  the 

Source 

File 

No.  File  Name 

Name 

Size  of 
File 

input 

1 Explanation  introducer  words 

KXPLANITRO 

30  phrases  each 
30  Ail  symbols 

input 

2 Cliches 

CLICHE 

10  phrases 
each  30  AN  symbols 

Da ta  type  code : 
tl-  numeric 
A=  alphanumeric 


1 1 3 


CM  Program  Modules 

Table  2 9 presented  the  title  and  general  function  of  each  of 
the  20  program  modules  comprising  the  CM  program.  Each  of  the 
module  specifications,  presented  in  Appendix  13,  includes  the  follow- 
ing information: 


NAME : 
NUMBER: 
PURI  OSE: 

TECHNIQUE: 


INPUT: 


FILES 

ACCESSED: 


GLOBAL 
DATA : 


UT: 


M 

CALLED : 


CALLIN': 

M I : 

:■  WCHfi  : 

M 1 N : 


Full  and  abbreviated  name  of  the  program  module 
Serial  number  of  the  program  module  (see  Table  29) 

Brief  description  of  the  function  of  the  program  module 
and  its  purpose  in  the  program. 

Semantic  organization  or  other  approach  to  be  utilized  in 
the  execution  or  calculation  of  the  program  module. 

A list  of  the  coded  names  of  all  variables  required  as 
inputs  to  the  program  module.  Refer  to  Appendix  E for 
a complete  list  of  all  data  items. 

A list  of  the  names  of  all  files  which  are  accessed 
by  the  program  module. 

A list  of  all  the  coded  names  of  all  data  available 
to  all  program  modules  accessed  or  changed  by  the 
module. 

A list  of  the  coded  names  of  all  variables  generated  by 
the  program  module. 

A list  of  the  coded  names  of  all  other  program  modules 
which  are  called  by  the  program  module. 

A list  of  the  coded  name:,  of  other  program  modules 
which  call  the  program  modules. 

A summary  flow  logic  (if  required)  showing  sequencing 
of  major  call  tasks. 

Note-  on  the  program  module  such  as  difficulty,  size, 
importance,  or  the  like. 


Although  not  explicitly  stated  in  Appendix  B,  it  is  assumed 
that  the  structured  programming  technique  will  be  utilized  in  the 
development  of  each  of  the  CM  program  modules.  As  part  of  the 
approach,  to  facilitate  program  checking  a queue-trail  list  giving 
the  module  numbers  in  the  sequence  in  which  they  are  called  during 
a run  should  be  introduced,  and  a count  of  the  number  of  times  each 
module  is  entered  should  be  maintained. 
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A running  commentary  of  the  total  program  may  be  seen  be 
simultaneously  reviewing  the  Appendix  B specifications,  principally 
the  purpose  and  technique  sections,  while  referring  to  the  global 
flow  chart.  Figure  B J . 

Sentence  Processing 


Each  sentence  is  processed  independently.  This  is  done  by 
searching  for  each  word's  record  m the  dictionary  file  (SEARCH 
module)  and  then  maintaining  the  principal  sentence-oriented  run- 
ning tallies  required  for  the  calculations  of  the  comprehensibility 
measures  (C'Ol  \ T module).  The  process  repeats  for  all  words 
in  the  sentence  and  when  completed,  the  processing  of  the  total 
sentence  proceeds  with  sentence  parsing  (PARSE  module),  and 
calculations  of  measures  ( Measure/ SI,  and  Measure/ P modules). 

1 

Parsing 


While  most  of  the  SI  measures  may  be  obtained  using  nu- 
meric processing,  several  in  the  psycholinguist ic  categor\  require 
actual  parsing  of  each  sentence  of  the  text.  Basi  c!  on  the  state-of- 
the-art  in  automated  sentence  parsing,  automatic  sentence  parsing 
represents  one  of  the  most  difficult  aspects  of  CM  program  imple- 
mentation. Ilays  (F>67)  reviewed  the  basic  techniques  of  parsing, 
with  an  emphasis  on  implementation.  Aho  and  ( liman  (P‘72)  also 
presented  a thorough  review  of  the  various  parsing  techniques,  em- 
phasizing the  type  of  parser  which  is  most  appropriate  for  languages 
of  various  descriptions.  Yet,  to  the  best  of  our  knowledge  of  the 
current  technology,  no  automatic  parser  exists  which  will  meet  the 
requirements  of  the  CM  program. 

To  date,  a completely  general  system  which  will  "understand 
arbitrary  input  text  in  English  anil  act  on  this  understanding  is  well 
beyond  technological  capability.  Even  the  limited  capability  to  parsi 
large  number  of  general  sentences  correctly  is  unavailable.  In  rou- 
tine auding  and  reading  tasks  a person  uses  the  context  in  which  in- 
formation is  presented  to  resolve  ambiguities,  but  this  process  re- 
quires such  a wealth  of  specialized  information  that  it  is  all  but  im- 
possible to  conceive  of  automating  it  completely. 

Computerized  systems  have  been  successfully  developed,  at 
least  on  a pilot  scale,  which  have  structures  containing  some  speci- 
alized information  at  their  disposal,  and  as  a result  can  extract  un- 
ambiguous meaning  when  related  to  a small  part  of  a single  subject. 
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However,  a general  system  has  not  been  devised.  However  useful  a 
parser  of  the  specialized  type  may  be  in  other  applications,  parsers 
which  operate  only  on  specialized  input  texts  have  serious  handicaps 
for  use  in  the  CM  program.  The  main  disadvantage  for  general  work 
as  well  as  for  the  CM  program  is  the  fact  that  they  do  not  w'ork  well 
with  ordinary  English.  Their  versions  of  the  language  are  often  "tele- 
graphic English.  1 in  which  words  are  omitted,  coded,  and/or  abbre- 
viated. In  other  cases,  such  parsers  employ  a quasi-logical  applica- 
tion-unique notation,  and  in  still  other  cases,  highly  restricted  dic- 
tionaries or  grammar  rules  are  involved.  These  are  clearly  inappli- 
cable to  the  CM  program,  whose  goal  is  to  process  any  English  prose 
text,  as  written. 

A parsing  technique  was  sought  which  will  produce  all  possible 
grammatically  legitimate  parses  of  ambiguous  English  sentences. 

The  technique  developed  will  be  applied  within  the  CM  program  to 
each  sentence  of  each  text  block  to  be  measured  for  comprehensibil- 
ity. The  principal  output  from  the  parser  will  be  the  identification 
of  all  sentence  elements  (parts  of  speech,  clauses,  phrases,  etc.  ) 
for  each  sentence. 

To  illustrate  the  ambiguity  problem,  consider  Figure  35, 
which  attempts  to  disambiguate  the  sentence:  "Never  place  your 
fingers  in  the  cutting  area.  " The  second  column  of  Figure  34  pre- 
sents the  potential  parts  of  speech  for  each  word  in  the  sentence. 

These  are  obtained  in  the  CM  program  by  the  relatively  simple  ex- 
pedient of  a dictionary  lookup,  which  would  be  required  regardless 
of  the  parsing  scheme.  This  sample  eight-word  sentence  has  many 
more  than  one  possible  parse  because  some  of  its  constituent  words 
have  multiple  parts  of  speech  potential.  The  third  column  of  Figure 
32  indicates  that  up  to  Ix2x2x  1 x Ix2x3x  1 24  parses  are  possible 

without  bringing  sentence  grammar  logic  to  bear.  The  main  task  of 
the  parser,  then,  is  to  reduce  the  number  of  potential  parses  to  one 
or  a relatively  small  number. 

As  a byproduct,  the  parser  should  identify  selected  parts  of 
speech  and  generate  a tree  structure.  It  must  be  able  to  handle  con- 
text-free phrase  structure  grammar,  where  the  logical  rules  of  the 
grammar  are  built  into  the  parsing  program  module.  The  main  out- 
put will  be  the  selection  and  identification  of  the  most  likely  part  of 
speech  for  each  word  in  the  sentence.  In  the  example,  the  desired 
parser  output  would  be  the  underlined  part  of  speech  in  the  second 
column. 


The  parser,  however,  mmsf  also  produce  its  output  in  a form 
suitable  for  use  as  input  to  other  program  modules  which  will  accom- 
plish additional  calculations.  These  parsing  requirements  generated 
by  other  modules  are  identified  below: 


Sample 

Sentence 

Words 

Possible  Parts  of  Speech 

No.  of  Possible 
Parts  of  Speech 

Never 

adverb 

1 

place 

noun , verb 

2 

your 

pronoun,  objective 

2 

fingers 

noun 

1 

in 

preposition 

1 

the 

article,  adverb 

2 

cutting 

adjective,  noun,  verb 

3 

area 

noun 

1 

FIGURE  35.  POSSIBLE  PARSES  FOR  SAMPLE  SENTENCE 
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h<  vtn  in  the  F'igun  ■ • :xat ..  I • . Phi  r<  sirement 

t . tie  imj  Lies  that  i phrasi  tructured,  context  fre< 
i ars iiir  scheme  i j required. 

T Ti’ansiorraational  Complexity 

Detei >m  1 n< • wh< sther  the  ma in  . i . ■ verl  is  oi  i forn 
which  causes  it  to  be  considered  active,  passivi  , 
a t i v<  *-n  ;gat  i v<  - , r . a;  v<  -negati  v<  . 

CE-  Intel  Embedding 

Determine  th(  n imi  ei  : phrass  t th<  "right” 
d the  sui  j < tc  t verl  in  the  sentence . 

LB  - Right  Bran  hing  md  Left  Bran  hing 

etermine  the  n imi ei  i hained  mod ifying  clau  f 
on  the  right  : the  ibject  oi  Left  1 : th«  ibji 

DC-  Deleted  Complement 

I etei  mine  whs ithei  i ( th<  sub  j<  t . : th<  enten  e is 
b ject  oi  i m tifying  Laus<  in  wl  icl  th<  relative 
pronoun  has  been  deleted. 

MR  - :og n i t Lon  o:  . • :m  mtic  Relation 

Identify  th<  noun  in  th<  entence. 


The  algorithm  selected  for  the  CM  program. is  a bottom-up, 
left  to  right,  context  free  phrase  structured  procedure.  It  consists 
of  a preparser  and  a parser  designed  to  handle  any  general  English 
language  sentence  using  a series  of  parsing  logic  rules.  A pre- 
liminary set  of  these  parsing  rules  is  contained  in  Appendix  F. 

The  application  of  the  parsing  rules  begins  with  considera- 
tion of  the  first  two  words  in  the  sentence  to  be  parsed.  A rule  is 
sought  relating  them  and,  if  found,  they  are  aggregated  into  the  new 
logical  entity  (sentence  category).  If  not,  words  2 and  3 are  at- 
tempted. In  this  way,  these  parsing  rules  arc  used  repetitiously 
to  develop  the  parse  trees  for  the  sentence  under  consideration. 
Each  time  a parsing  rule  is  applied,  ihe  current  parse  tree  can  be 
extended  upwards  from  the  words  comprising  the  sentence  to  the 
peak  made  representing  the  total  sentence.  Thus,  a completed  tree 
can  be  thought  of  as  a sequence  of  applications  of  the  parsing  rules. 


The  rules  are  of  the  following  format: 


nl—oet  \ 


(a  noun  phrase  ma\  be  composed 
of  a determiner  and  a noun) 


which  is  interpreted  as  follows:  whenever  the  sequence  of  categories 
on  the  right  side  of  the  rule  appear  in  the  part  of  the  parse  tree  which 
has  been  completed  so  far,  the  tree  can  be  extended  upwards  from 
these  categories  to  the  category  on  the  left  side  of  the  rule.  Three 
data  arrays  are  associated  with  the  set  of  parsing  rules;  LEFT, 
RIGHT,  and  LENGTH.  LEFT  contains  the  categories  which  appear 
on  the  left  side  of  the  arrow  in  the  respective  rules;  RIGHT  contains 
the  categories  appearing  on  the  right  side  of  the  rules,  so  that  the 
ith  member  of  the  array  RIGHT  contains  as  many  categories  as  there 
are  categories  on  the  right  side  of  the  arrow  in  the  ith  rule;  and 
LENGTH,  is  an  array  in  which  the  ith  member  is  the  number  of  cate- 
gories on  the  right  side  of  the  arrow  in  the  ith  rule. 

During  this  process,  the  program  module  maintains  files  for 
each  word  of  the  sentence  under  consideration,  as  shown  in  Figure 
36,  which  illustrates  the  parsing  of  the  sentence:  "The  good  sur- 
vive the  bad.  " The  parsing  module  is  executed  once  for  each  possi- 
ble combination  of  assignments  of  categories  to  the  words  of  the  sen- 
tence. The  preparser  routine  partially  develops  this  table  on  the 
basis  of  one  of  the  allowable  assignments  of  categories  to  words  and 
phrases.  The  preparser  partially  generates  the  columns  in  the 
parse  table  (Figure  36)  for  category,  pointer,  signal,  and  rule,  and 
transfers  control  to  the  parser.  The  appendix  contains  the  speci- 
fications for  the  parsing  module. 

Sentence  Summary 


A summary  of  all  sentence  results  (SEN  LSI  M module)  and 
listing  or  display  (SEN TOl’T  module)  concludes  the  sentence  process- 
ing. Control  reverts  to  circle  B in  Figure  33  if  the  sentence  proc- 
essed is  the  last  one  in  the  text  block;  this  provides  for  repeating  the 
process  for  each  sentence.  If  the  end  of  the  block  has  been  reached, 
the  traditional  measures  are  determined  (RGL  module)  and  option- 
ally recorded  (RGLOl'T  module).  Appendix  U shows  the  output  for- 
mats. The  B LOCKS!  M module  then  summarizes  the  results  of  each 
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SAMPLE  SENTENCE:  The  good  survive  the  bad. 


TREE 


Yngve  depth: 


k 


Simplified  PARSE-TABLE 


Average  Yngve  Depths  5/5=  1 


r~ 

Ca  tcgory 

| Pointer  | 

Signal  | 

Rule 

| YDtts 

Source 

i — 

i 

Det 

6 

1 

0 

1 

2 

N 

6 

-1 

0 

0 

Original 

3 

V 

8 

1 

0 

1 

Sentence 

4 

Det 

7 

1 

0 

1 

5 

N 

7 

-1 

0 

0 

6 

NP 

9 

1 

4 

1 

Pre-parser 

7 

NP 

8 

-1 

4 

0 

8 

VP 

9 

-1 

3 

0 

Parser 

9 

S 

0 

0 

1 

Taken  from 

Points  to 

1=  left 

identi- 

dictionary 

next  cate- 

0=  other 

f ies 

gory  above 

rule  used 

RULES: 

1=  S -*  NP 

VP 

ABBREVIATIONS: 

NP-noun  phrase 

2=  NP  -*  N 

N -noun 

3=  VP  ->  V 

NP 

VP-verb  phrase 

4=  NP  ->  Det 

N 

V -verb 

S -sentence 
Det -determiner 


To  calculate  average  Yngve  depth,  for  each  path  from  the  top-most  S of  the 
tree  to  the  individual  words  of  the  sentence,  add  the  YDtts  which  have  been 
assigned  to  the  branches  of  the  tree  as  above.  Then  add  together  all  these 
suns  and  divide  into  the  number  of  words  in  the  sentence. 


Figure  36.  Example  of  parsing  technique. 
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sentence  and  generates  data  required  to  be  displayed  in  the  MEASURE- 
OUT  module.  The  normalization  of  measures  into  percentiles  over 
the  5th  to  95th  percentile  range  is  included  in  BLOCKSUM.  The  ta- 
bles used  in  this  process  are  based  on  actual  measurement  of  Air 
Force  technical  materials  and  are  given  in  Appendix  G.  Using  these 
percentile  scores  for  each  measure.  BLOCKSUM  also  calculates 
the  composite  index  representing  a single  comprehensibility  score 
for  the  entire  textual  block. 

Processing  then  returns  to  circle  A of  Figure  33  to  per- 
form resets  and  process  the  next  textual  block.  When  the  last 
block  has  been  processed,  the  RUNSUM  module  develops  all  sum- 
mary data  required  to  list  or  display  results  of  the  entire  run  for 
all  blocks.  The  RUNOUT  module  controls  this  display  or  listing  to 
complete  the  run  processing. 

Output  Results 


Formats  for  each  of  the  five  types  of  output  are  specified 
in  Appendix  D: 


Figure 

Types  of  Output 

Automatic/ Optional 

D-l 

Sentence  Summary 

O 

D-2 

Dictionary  Check 

A (CHECK  RUN) 

D-3 

Block  Summary 

O 

D-4 

Run  Summary 

A (MEASURE  RUN) 

D-5 

RGL  Summary 

O 

On  these  reports,  the  line  and  word  number  are  given  to  orient  the 
user,  allowing  him  to  correlate  the  computer  results  with  any  given 
sentence  or  block  of  text  by  number.  When  requested,  the  dictionary 
check  output  displays  all  words  in  text  which  were  searched  in  the 
dictionary  and  found  not  to  be  included.  A given  word  is  listed  on- 
ly once  in  this  output,  together  with  the  number  of  times  it  occurs  in 
the  block.  The  percentage  of  words  not  found  in  the  dictionary  is 
listed  for  each  block  and  for  the  total  run.  The  space  provided  at 
the  right  of  the  listing/display  is  for  comment  by  the  analyst  or  edi- 
tor. This  provides  the  analyst  with  a convenient  location  for  enter- 
ing his  instructions  for  future  processing.  On  a word-by-word  basis, 
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the  analyst  can  enter  his  decision  to  make  a; 


Code 

spelling  change  in  text  1 

new  dictionary  entry  2 

dictionary  word  change  3 


This  same  listing  then  can  be  used  as  an  input  for  required  changes 
in  text  or  dictionary  before  another  run  is  made. 

The  block  results  shows  values  and  percentiles  for  each 
measure  and  for  the  major  variables,  together  with  the  composite 
indices.  The  text  will  be  printed  (with  or  without  line  numbers)  if 
so  requested  in  the  LIST  option  of  the  run  request. 

The  results  for  the  run  present  all  measure  values  both  as 
calculated  and  after  conversion  into  percentiles  on  a block-by-block 
basis.  Complete  indices  are  also  shown.  Some  specific  tallies  of 
useful  variables  (number  of  words  per  block,  average  number  of 
parses  per  sentence,  number  of  sentences  parsed  and  not  parsed, 
average  number  of  morphemes  per  word)  are  also  specified.  The 
last  report  of  RGLs  and  the  ARI  index  is  optional  but  similar  in  or- 
ganization to  the  run  summary  report. 

Error/ Condition  Messages 

A variety  of  conditions  may  arise  during  a CM  program  run 
which  terminate  a run  or  which  represent  situations  about  which  the 
user  must  be  made  aware.  A list  of  these  conditions  and  the  speci- 
fic wording  for  the  messages  to  be  recorded  or  displayed  are  speci- 
fied in  Table  33.  Messages  will  be  displayed  on  the  user's  inter- 
active terminal  or  recorded  in  the  printout,  depending  on  whether 
the  run  request  syntax  <LIST  LOCATION>  is  either  TERMINAL  or 
PRINTER  (default  is  printer). 


Table  33 


Error  Message  Identification 


ERROR  MESSAGE 

SOURCE 

MODULE 

ACTION 

CODE 

1 MISSING  "CHECK"  OR  "MEASURE." 

READ 

2 UNIDENTIFIED  REQUEST. 

READ 

c 

3 INVALID  <FILE  ID>. 

READ 

c 

4 NO  SUCH  TEXT  LINE. 

SCAN 

E 

5 NO  SUCH  WORD  NUMBER  IN  TEXT  LINE. 

SCAN 

E 

6 FILE  NOT  DICTIONARY  FILE. 

READ 

D 

7 FILE  NOT  PRESENT. 

READ 

D 

8 FILE  NOT  TEXT  FILE. 

READ 

A 

9 FILE  NOT  EXAMPLE  FILE. 

READ 

D 

10  file  NOT  CLICHE  FILE. 

READ 

D 

11  SYNTAX  ERROR  IN  < > SPEC. 

READ 

C 

ACTION  CODE : 

A=  ABORT  RUN 

E=  Produce  final  reports  and  terminate 

D=  Use  default  file;  abort  if  default  not  present 

C=  Continue  syntaxing  input,  then  abort 


Wherever  possible,  the  program  is  forced  to  continue  rather 
than  be  terminated  at  the  point  the  error  or  condition  is  recognized 
and  the  message  given.  Table  33  identifies  those  errors  or  condi- 
tion which  should  result  in  programmatic  termination,  and  also  speci 
fies  the  conditions  under  which  the  CM  program  will  continue. 
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Discussion  of  CM  Program  Development 


The  present  work  made  considerable  progress  in  specifying 
tiie  characteristics  of  a high  speed  digital  computer  program  which 
would  allow  automatic  calculation  of  the  various  measures.  Such  a 
program  is  detailed  in  the  appendices  to  this  report,  and  its  devel- 
opment seems  entirely  possible.  The  required  parsing  subroutine 
represents  the  most  difficult  aspect  of  such  a program.  However, 
the  logic  for  such  a program  was  developed  and  seems  to  be  ade- 
quate for  the  purposes  on  hand.  The  decision  to  develop  a special 
parsing  program  was  made  on  the  basis  of  the  advice  of  linguistics 
and  computer  analysts  who  indicated  that  known  and  available  pars- 
ing programs  would;  (1)  not  entirely  fill  the  current  requirements, 
and  (2)  consume  excessive  computer  running  time.  Hence,  the  tai- 
lored parsing  program,  as  compared  with  an  "off  the  shelf"  pro- 
gram, was  believed  to  be  more  cost  effective  in  the  long  run. 

Other  aspects  of  the  CM  program  are  relatively  straight- 
forward and  seem  to  possess  little  risk.  Hence,  completion  of 
the  programming  aspects  of  the  required  work  seems  possible. 

The  computer  program  as  designed  will  measure  textual 
characteristics  and  provide  diagnostic  information.  It  will  not 
suggest  alternate  wordings  or  sentence  constructions.  Such  de- 
cisions are  best  left  to  the  technical  writer.  This  point  of  view 
has  been  expressed  recently  by  Sticht  and  Zapf  (1976),  who  wrote; 

The  computer  seems  to  offer  potential  help, 
especially  if  the  functions  assigned  it  are 
those  the  writer  does  not  like  to  do,  or  do 
well,  and  that  are  compatible  with  computer 
capabilities.  For  example,  computers  can  be 
made  to  do  language  recording  and  storage  tasks 
easily  and  efficiently,  but  language  decision 
tasks  only  with  difficulty  and  poorly,  as  the 
machine  translation  literature  clearly  shows. 

Humans,  on  the  other  hand,  have  just  the  • 
opposite  proclivities.  Consequently,  computer 
aid  to  the  writer  should  focus  on  the  former. . . 
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APPENDIX  A 


Summary  Specifications  for  Each  Measure 


r 


. _ ... m 


COGNITION  OF  SEMANTIC  UNITS 


1 . Measure  number 


1 


2. 

Category 

St ructure-of- Intellect 

3. 

Abbreviation 

CMU 

4. 

Explanation 

Type/Token  ratio.  A function  of  the  number 

of  different  words  in  a block  of  text,  NDWB, 
and  the  total  number  of  words  in  the  block. 


TNWB 

5.  Computational  complexity  Readily  automated 

6.  Dictionary  requirements  Entries  will  include  abbreviations  as  well  as 

words.  Each  will  be  so  identified. 


7.  Symbolic  definition 


8.  Scaling 

9.  Rules  utilized  (1)A  word  is  defined  to  consist  of  any  number 

of  consecutive  alphanumerical  characters  pre- 
ceded by  a space  and  followed  by  a comma, 
space,  exclamation  point,  colon,  or  question 
mark. 


! - ™ = CMUB 
TNWB 


0 <CMUB  £ 1 

1=  most  comprehensible 


(2) Two  words  are  the  same  only  if  they  are 
spelled  exactly  the  -same  (i.e.,  prefixes, 
tenses,  plurals,  etc.,  will  be  taken  into 
account  and  the  word  "walk"  and  "walked" 
will  be  counted  as  different  words). 

(3) Abbreviations  of  multiple  words  (e.g., 
"USAF,"  "USSR,"  and  "APA"  will  each  be 
counted  as  one  word.  (A  count  of  the 
number  of  such  abbreviations,  NAMB, 
determined  in  calculating  CMU  will  ba»  re- 
tained for  later  use  in  calqulating'  ESI.) 


(4 hyphenated  words  will  be  counted  as 
multiple  words  (i.e.,  a hyphen  will  be 
considered  like  a space)  except  in  the 
following  prefixes  and  suffixes  and  post 
fixes : 

pre-war 

post-war 
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COGNITION  OF  SEMANTIC  UNITS  (com.) 


(e.g.  , "never-to-be-forgotten"  is  4 words) 

This  same  rule  applies  to  words  containing  a 
slash  such  as  "upper/lower." 

(b)  Each  numerical  value  will  be  counted  as 
one  word  (e.g.,  4.56,  1x10*  1 or  4 ti  ) . 

(6)  of  t fits’  1 • •*  ter  of  i 
word  will  be  ignored.  Thus,  a word  which  is 
capitalized  because  it  starts  a sentence  or 
appears  in  a title  is  counted  as  the  same  word 
as  ii  it  had  been  composed  of  all  lower  ca  - 
letters  which  appear  elsewhere.  (This  is  not 
true  when  the  word  ends  in  a period  - see  13 
below. ) 

(7)  A word  composed  entirely  of  capital  letters 
is  counted  as  different  from  the  same  word  in 
lower  case  letters. 

(8)  Italics  are  ignored  in  word  counting;  thu:  , 
an  italicized  word  is  counted  as  the  same  woi  1 
as  one  not  italicized. 

(9)  One-character  symbols  which  occur  as  one- 
symbol  words  will  be  counted  as  one  word. 

(10)  Each  word  in  a spelled  out  number  will  be 
counted  as  one  word  (e.g.,  "eight  hundred" 
will  be  counted  as  two  words,  whereas  "800," 
from  above,  is  counted  as  one  word. 

(11)  Tables,  figures,  maps,  diagrams,  illustra- 
tions and  the  like  are  not  considered  in  tin's 
measure,  but  titles  of  these  are  included. 

(12)  Words  within  titles  and  headers  will  not 
be  included  in  counts  of  NDWB  and  TNWB. 

(13)  All  one- word  abbreviations  composed  of  any 
combination  of  upper  and  lowei  case  alpha- 
numeric letters  will  be  counted  as  one  word. 

An  al  1 vi  it  ion  here  is  defined  to  be  any  word 
with  or  without  interspersed  periods  followed 
by  a period.  (The  number  of  such  single  word 
abbreviations,  HASP,  is  retained  for  later  use 
in  determining  ESI . 


COGNITION  01'  SEMANTIC  UNITS  (cont.) 


(14)  A sentence  will  be  determined  by  scanning 
text  and  identifying  groups  of  worlds  such 
that: 


a.  the  first  starts  with  an  initial 
capital  letter 

b.  the  last  word  ends  with  a period,  ques- 
tion mark  or  exclamation  point,  and 

is  followed  by  one  or  more  spaces 

and  then  a word  with  an  initial  capital 

letter. 
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COGNITION'  OF  SEMANTIC  RELATIONS 


1. 

Measure  number 

2 

2. 

Category 

Struct ure-of- Intellect 

3. 

Abbreviat ion 

CMR 

4 

Explanation 

Number  of  shared  nouns  in  a text  bloc!  N 

!:  N3 

per  sentence  pair  plus  the  nur : •:  < - v for.-aceu 
in  a text  block  NORB,  divide!  by  t r c-f 

words  in  the  block  TNWB. 

This  measures  the  number  of  lin’r  -■  r ‘let  ‘ 
ships  in  a text  block. 

5.  Computational  complexity  Moderate 

6.  Dictionary  requirements 

7.  Symbolic  definition 

8.  Scaling  Highest  value  = most  comprehensible 

0 £ CMRB  £ 1 

9.  Rules  utilized  (1)  The  number  of  references  NORB  is  detei^mined 

by  counting  the  total  number  of  pronouns.  This 
will  require  sentence  parsing  to  determine  which 
words  are  used  as  pronouns . 

(2)  The  number  of  shared  nouns.  NSNB  is  calculated 
by  considering  adjacent  sentence  pairs  and 
tallying  the  number  of  times  per  pair  a common 
noun  occurs  in  both  sentence,  (plural  nouns, 
are  considered  as  if  they  were  singular  in  this 
process ) : 


Identify  reference  words 


CMRB- 


/NSNB  \ 
V,  TNSB  - 1/ 


No.  of  nouns  appearing 
Hi  foth  sentences  of  pair 


( . >N  OT  SKMANTI C RELATIONS  (cont.) 


Sentence  Pairs 


N - 1 

E 

Calculate:  A~. 1 

N - 1 


NSNB 
TNSB  - 1 


This  will  require  sentence  parsing  in  which  each 
word  is  identified  as  a noun  or  not.  Words 
which  are  ambiguous  (i.e.,  could  be  noun  or 
another  part  of  speech  and  parser  cannot  dis- 
tinguish) will  be  considered  as  nouns.  Multiple 
occurrences  of  a single  noun  are  counted  as  one 
occurrence,  e.g. , a given  noun  appearing  twice 
in  one  sentence  and  three  times  in  the  adjacent 
sentence  is  tallied  as  a single  occurrence  in 
determining  n-.  No  test  will  be  made  on  the 
meaning  of  the  nouns,  and  if  the  words  match 
it  will  be  assumed  they  have  the  same  meaning. 

(3)  TMWB  in  the  same  as  was  determined  in  CMU. 


MEMORY  OR  SEMANTIC  UNITS 


1 . Measure  number  3 

2.  Category  Structure-of-Intellect 

3.  Abbreviation  MMU 

4.  Explanation  Thin  measure  is  a function  of  the  number  of 

different  nouns  per  block  of  text,  NDNB  and 
the  total  number  of  words  in  the  block  TNWB 


5.  Computational  complexity  Requires  sentence  parsing  for  determination  of 

nouns . 


6.  Dictionary  requirements  Identify  parts  of  speech  in  all  words  including 

all  one-syllable  symbols. 


7.  Symbolic  definition 


MMUB=  1 


NDNB 

TNWB 


8.  Scaling  0 < MMUB  < 1 

1=  most  comprehensible 

9.  Rules  utilized  (1)  Words  will  be  identified  as  nouns  by  a 

dictionary  search  and  additional  parsing  as 
required. 

(2)  Two  nouns  are  the  same  only  if  they  are 
spelled  exactly  the  same  (i.e.,  "grass"  and 
"grasses"  will  be  counted  as  different  words. 


(3)  Abbreviations  will  be  counted  as  no  more 
than  a single  noun  (e.g. , "USA,"  S.P.C.A.,  and 
"Soc."  are  single  nouns).  Some  abbreviations, 
not  referring  to  or  containing  a noun  will  not 
be  counted  as  a noun  (e.g.,  i.e.,).  This  will 
be  determined  via  a dictionary  search. 

(4)  The  following  rules  from  CMU  apply: 

4-  words  in  with  hyphens,  slashes 

6 - initial  caps  ignored 

7 - all  caps 

8 - italics  ignored 

12  - tables,  figures  ignored 

13  - titles,  headers  excluded 


MEMORY  Oi  EEMAHTTC  Ulii'iO  (cont.) 


(:.)  the  number  of  nouns  to  be  counted  in 
symbols  will  be  contained  in  the  dictionary, 
e.g.  , 

C 1 
i 0 

((>)  A single  noun  app<  iring  twice  within  i 
block  with  different  meanings  will  count  as  1 
in  NDNB. 


(7)  Any  numerica’l  valu>‘  which  begins  with  the 
symbol  $ or  <£  or  ends  with  C or  t will  not  be 
counted  as  nouns. 


EVALUATION'  01'  SYMBOLIC  IMP!  1CATI0H, 


1 . 

Measure 

4 

2. 

Category 

St  ructure-of -Int  cllect 

3. 

Abbreviat ion 

ESI 

4. 

Explanation 

This  measure-  is  a function  o! 
abbreviated  or  symbolic  words 

the  number  of 
in  a block  of 

text,  NSWB,  and  the  total  number  of  w irds  Ln 
the  block,  TNWR. 


5.  Computational  complexity 

6.  Dictionary  requirements 

6. 

7.  Symbolic  definition 


Readily  automated 
No  unique  requirements 


NSWB 

TNWB 


= ESIB 


8.  Scaling  0 £ CMUB  i 

1=  most  comprehensible 

9.  Rules  utilized  (1)  The  number  of  abbreviated  or  symbolic 

words  NSW  is  equal  to  the  sum  of 

a.  no.  of  multiple  word  abbreviations,  NA‘ 

(see  item  3 of  CMU) 

b.  no.  of  single  word  abbreviations,  NAS 

(see  item  14  of  CMU) 

c.  no.  of  one  character  symbols 

(see  item  9 of  CMU) 


(2)  TNWB  is  as  calculated  in  CMU. 


(3)  Numbers  with  or  without  decimal  points  do 
not  count  as  abbreviations. 


(4)  A word  starting  with  $ and  ending  with  C 
counts  as  a single  word. 


CONVEKf'.nn  PRODUCTION-  OF  SEMANTIC  IMPLICATIONS 


2 . Categ 


St  rue  t :n\- —of  - In  t el  I ect 


; . Abbr eviat ion 


4,  Explanation 


Measure  of  the  average  number  of  part f • of 
speech  pet  w< r d in  a text  block.  Measures 
frequency  of  need  for  reader  to  t:iak.>  inference 


5.  Computational  complexity  Readily  automated 

6.  Dictionary  requirements  Parts  of  speech  for  each  word 


7.  Symbolic  definition 


Where  TPSB  is  the  total  number  (not  total 
different  number)  of  parts  of  speech  in  all 
words  in  a text  block. 


8.  Scaling 


9.  Rules  utilized 


0 £ NMIBi  1 

highest  is  most  comprehensible 

(1)  Count  past  participle  as  7 parts  of  speech, 
present  participles  as  3. 


DIVERGENT  PRODUCTION  OF  SEMANTIC  UNITS 


r 


f 


1 . Measure  number 

2.  Category 

3.  Abbreviation 


8 

Structure- of -Intellect 
DMU 


4.  Explanation 


Measure  of  the  number  of  elucidation::  or  ex- 
planations per  sentence  in  a block  of  text  = NESS. 


(More  desirable,  but  not 

considered  would  be  ° need  for  explanations  filled) 


5. 

Computational  complexity 

Modest 

6. 

Dictionary  requirements 

Store  file  of  key 
an  explanation  is 

words  which  identify  that 
forthcoming. 

ENESS 

. . . . s 

7.  Symbolic  definition  DMUB=  

TNSB 

TNEB 

TNSB 

8.  Scaling  0 <_  DMUB  £ 1 

highest=  most  comprehensible 

9.  Rules  utilized  (1)  Count  one  explanation  for  each  occurrence 

of  the  following  word  or  word  combinations: 

that  is 

i . e. 

thus 

consequent ly 
in  other  words 
therefore 
to  illustrate 
for  example 

No  more  than  one  occurrence  of  an  explana- 
tion i s counted  per  sentence. 


k. 


1 SB 


DIVER! SENT  IKMUri  fOU  Of  SEMANTIC  UNITS  (co:,t.) 


(?)  Under-  condition:;  to  be  specified,  occurrence 
of  the  following  words  (if  used  in  connection 
with  an  explanation)  will  be  counted  a?,  an 
explanat ion : 

elucidate 

explain 

illustrate,  illustration 

expound 

instance 

case- 

example 


; 


MO 


1 


k. 


yn  ',vi:  di:pth 


1. 

Measure  number 

9 

2. 

Category 

Psycholingusit ic 

3. 

Abbreviat ion 

YD 

4. 

Explanat ion 

In  order  to  determine  the  depth  or  comp  lex ! t 

of  a sentence  as  defined  by  Yngve,  line:; 
coming  out  of  each  node  of  a p.irse  diagra- 


are  numbered  0,  1,...  from  right  to  left.  F ich 
word  W,  in  the  sentence  (at  the  bottom  of  the 
diagram)  is  assigned  a value  equal  to  the  sum 
of  the  numbers  along  the  path  from  initial 
symbol  (S)  to  that  word.  The  Yngve  depth  of  a 
given  word  in  the  sentence,  YD(W)  is  the  aver- 
age of  these  numbers. 

5.  Cot:.;  :•  it  ional  complexity  Difficult  since  it  is  based  upon  need  .foi 

complete  sentence  parsing. 


6.  Dictionary  requirem  il 


Parts  of  speech  as  required  for  parsing. 


7 . Symbol ic  definition 


YD ( W) = sum  of 
to  the 


YD 


YDS= 

EYD(W) 
W 


ydb 


T. 

S YDS 
TNSB 


all  digits  on  parse  path 
given  word 


8.  Scaling  0 <YDB  <_  1 

Higher  YDs  correspond  to  most  comprehensible 
sentences. 

9.  Rules  utilized  (1)  Parse  sentence. 

(?)  Number  the  1 ines  coming  out  of  each  node 
0,  1,  ...  from  right  to  left.  Assign  to  each 
wor  i th'  sum  of  the  numbers  along  the  path  from 
the  initial  symbol  to  the  word.  YD  is  the 
average  of  these  numbers. 

(3)  ! ■■■  not  include  title,  headers. 


MORPHEME  DEPTH 


1 . Measure  number 

2.  Category 

3.  Abbreviation 


10 

Psychol inguistic 
MD 


4.  Explanation  Count  of  the  number  of  word;  per  text  bio  I , 

divided  by  'the  number  of  nor;  me  in  the 
block,  TNMB. 


5.  Computational  complexity  Modest 

6.  Dictionary  requirements  Store  number  of  morphemes  for  each  dictionary 

entry 


7.  Symbolic  definition  MDB=  — — - — 

TNMB 

8.  Scaling  0 <_  MDB  <_  1 

Highest  is  most  comprehensible 

9.  Rules  utilized  (1)  Find  morpheme  counts  pier  word  via  dictionary 

lookup . 

(2)  Each  numerical  value  is  counted  as  one 
morpheme . 

(3)  All  abbreviations,  whether  one  word  (Mr) 
or  multiple  words  (USAF)  are  tallied  as  a 
single  morpheme. 

(4)  Capitalization  is  ignored  in  morphe  counting. 

(5)  A one-character  symbol  which  occurs  as  one 
symbol  word  is  counted  as  one  morpheme. 

(6)  Cliches  will  be  counted  as  one  morpheme. 


14  3 


-1.  ■' 


11 


Ti'.AN  • i \ 

rrv  number 
Category 
Abbrev i it  ion 
Explanat ion 

omj  it  itional  compj  ex  1 1 y 
Diet  ionary  requirements 

Symbolic  definition 


Scaling 
Rules  utilized 


. c :■;*  ivy 


/ ho  ingui  t : ■ 
TC 


Determination  as  to  whether  each  clau  e is 
ac  ' ive , passive,  j i siv  n<  ;at  Lv<  , or 
act ive-negat ive. 


L<  than  that  oi  parsing,  as  method  depends 
on  part i tl  output  of  parser. 

Parts  of  the  verb  "to  be"  must  be  labeled  as 
such;  verbs  must  be  labeled  as  transitive  or 
intransitive; 'past  participles  oi  verbs  mu:  : 
be  labeled. 


1.00 

if 

sentence 

is 

active 

0.95 

if 

sentence 

is 

passive 

0.75 

if 

sentence 

is 

active-negative 

0.20 

if 

sentence 

is 

pass ivc-negat ive 

TCB= 


ETCS 

S 

;i  NCR 


TNCB= 


TNCS 

TNSB 


0.2  TCB  1.0 

Higher  numbers  corre  pond  to  sentences  which  are 
easier  to  comprehend. 


( 1 ) Isolate  th<  c nplex  verl  c nstituent  oi 
the  main  clause. 

( 2 ) If  the  last  tw  word  thi  stru  ture  are : 

(a)  the  ist,  present,  future  or'  inf  initivi 
of  the  verb  "to  be":  followed  by 

(b)  the  past  participle  of  a transitive 
vi  rl  then  thi  sentenci  ' in  the 
passive  voice. 


(3)  Otberwi.se,  it  X::  in  the  active  voice. 

( 4 ) it  there  in  one*  oceurr  •.■rice  of  any  form 
of  the  word  "not"  within  • ' complex  verb 

c natit uent , t hen  the  aentenc<  i c nsi dere  I 
negative.  These  include: 

never 

not 

no  i ther 
none 

words  ending  with  n't 

(5)  Thus,  if  the  sentence  is  both  passive  and 
negative,  it  is  called  passive-negative. 

(6)  If  a sentence  is  ambiguous,  and  the 
different  meanings  of  it  lead  to  different 
values  for  TC,  then  omit  the  measure  for 
that  sentence. 

(7)  Each  clause  must  have  a subject  and 
predictate. 


1 . Measure  numl  ■ ■ 


CENTER  EMBEDI  IN  ! 


12 


2. 

Category 

Psycholinguist ic 

3. 

Abbreviation 

CE 

4. 

Explanation 

A measure  of  the  number  of  chained  modifying 

clauses  or.  the  right  of  the  subject  noun 

phrase  of  a sentence. 

5. 

Computational  complexity 

Equal  to  that  of  parsing 

6.  Dictionary  requirements  None  beyond  those  of  parsing 

7.  Symbolic  definition  The  number  of  sentences  in  a block,  TNSB, 

divided  by  the  number  of  phrases  to  the  right 
of  the  subject  noun  in  a sentence,  NNPS , summed 
over  all  sentences  in  a block: 


8.  Scaling 


9.  Rules  utilized 


CEB  = 1 


F.NNPS 

s 

TNSB 


(The  NNP  of  NP  VP  is  0.  ) 


0 £ CEB  £ 1 

The  higher  the  SE,  the  easier  the  sentence. 

(1)  Parse  the  sentence. 

(2)  If  the  parse  is  of  the  following  form, 
NNPS=  1: 


S 


*Any  sentence  element  (category) 
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Cr.NTF.k  KMi'-i:!  !)Tt!  (cont . ) 


(3)  If  the  pai  < i of  t ht  foil  wii  1 ri 
NPI‘S=  2: 


Any  sentence  element  (category) 


RIGHT  BR-V.YH 


1 . Mea  ire  numl 

2.  eg*  ry 

3 . A1  : ■ ■ :v  1 . 1 1 i >n 


13 

Psych* >lingui  ti 
RB 


4.  Explanation  A measun  of  the  number  of  chained  modify inj 

clauses  (elaborations)  on  the  right  of  the 
object  noun  phrase  of  a sentence. 


3.  Computational  complexity  Equal  to  that  of  parsing. 

Dictionary  requirements  None  beyond  those  of  parsing 


7.  Symbolic  dot  in  It  Lon  The  number  of  sentences  per  block,  TNSB, 

divided  by  the  number  of  chained  modifying 
clauses  on  the  right  of  the  object  noun 
phrase,  NCRS,  summed  over  all  sentences  in  a 
block . 


TNSB 

1NCRS  + TNSB 
S 

(The  NCSR  of  NP  VP  is  0.) 

8.  Scaling  0 <_  ROB  <_  1 

The  higher  the  RB,  the  more  comprehensible 
the  sentence. 

9.  Rule.'  utilized  (1)  Parse  the  sentence. 

(2)  If  the  parse  is  of  the  following  form, 

HCRS=  1: 


T.NCRS 

RBB=  1 

KNCRS  4 TNSB 
S 


S 


/\ 


NP  VP 


NP 


Any  sentence  element  ( ito)  >ry) 


I,!!!  i !■  '.ANOHIN  ' 


1 . 

Measure 

14 

2. 

Cat  egory 

Psyi-ho.1  i ngu  i . t i 

3. 

Abbi  evi at  ion 

LB 

4. 

Explanat ion 

A measure  of  tfr  numbvr  of  rhainc-i 

rod i f y ing 

claus<  n thi  Left  oi  tl 

a sentence. 

noun  of 

5. 

Computational  complexity 

Equal  to  that  of  parsing. 

6. 

Dictionary  requirements 

None  beyond  that  of  parsing. 

7. 

Symbolic  definition 

The  number  of  sentences  per  block. 

TNSB, 

divided  by  the  number  of  chained  modifying 
clauses  on  the  left  of  the  subject  noun  of  a 
sentence,  NCLS,  summed  over  all  sentences  in 
a block: 


Y.  NCLS 

LBB=  1 - — § 

TNSB 

(The  NCLS  of  NP  VP  is  0.  ) 

8.  Scaling  0 LBB  < 1 

I he  higher  the  LB,  the  more  comprehensible 
is  the  sentence. 

9.  Rules  utilized  (1)  Parse  the  sentence. 

(2)  Count  the  number  of  modifying  clauses. 

Left  branching  involves  the  presence 
of  clauses  modifying  the  subject  noun 
phrase  of  a sentence,  as  in: 

The  stogie- chewing  dictator  laughed. 

The  presence  of  such  a clause  requires 
that  the  parse  bo  of  the  following  form 


150 


HP 

/ \ 

Prt  NP 

/ \ 

HP  Prt 


The  newly  introduced  n m phrase  m be  itseli 
rewritten  i ii  terms  of  3 new  noun  phi  isi  and  a 
ti.it-;  Lciple  j hi  ase,  t o yield  i nt enc<  ; u h a ; 

hi  sm  -< mitt  ing-s  fc<  git  - ting 
cli  c la  tor  laughed. 

Torn.'  of  the  type: 

The  belly-laughing  si  >gie-smc Ing 
dictator  laughed. 

are  n >1  of  the  Left  branchi  ng  variety  Lx  ausi 
belly-laughing  doe;;  no:  modify  stogit  . 

■ ijnt:  A senten  witl  it  the  Left-branching 

featui  being  described  has  NCLS=  0 . Every 
time  there  is  a rewriting  of  the  leading  noun 
phrase  a a participle  phrase  and  noun  phrase, 
add  1 to  NCLS. 


M* *.i  ;ui‘« * 


■y 


H I .KM  N 


?.  i.  ■ ! t Ogf  : / 

Abbreviate  n 


lb 

1 y . iisti< 

DC 


4 . ! :xj  iii 


M<  i frequ<  whi  th<  mpleiw  t i - 

"that"  of  a noun  para  e c nt.  hi:  been 

eted  e.g. , i in:  "th<  t hi  un  ■ . 

not  obviou: . . " 


b . Ci  mj  utational  < ■ 1 • >x i t y 
6.  Di<  tionary  r<  juirem  nt 


to  that  parsing. 


mplementizers  must  1 la  - . 


7.  Symbolic  definition 


8.  -fling 


9 . Rules  uti  i z<  1 


DCS  0 if  complement!  y.a  i:,  present 
DCS  1 if  relative  pronoun  j • dej<  nj. 


DCS 


>:dcs 

S 

1 B 


0 £ DCB  jy  1 

Higher  I rrespon 
sentence  . 

( 1 ) Par  o t he  n-  :it  ■ • . 

(?)  11  a*,  any  j 1..  •:  1:  , . 

phrasi  t witl 

i:;  pro  • nt,  th-n  DCS-  0. 

(3)  0th- -rw:  - , DCS  1. 


lb  2 


- - — . 


APPr.ND  I X I S 


Dot  a i 1 t'd  Sprc  i fi  cat  ions  for  Each  Program  Modulo 


The  regression  equations  presented  in  this 
section  should  he  regarded  as  approximate. 
These  equations  for  predicting  comprehensi- 
bility for  high  reading  grade  level  readers, 
low  reading  grade  level  readers,  and  for 
both  types  combined  are  preliminary  and 
subject  to  modification  as  additional  work 
with  these  equations  unfolds. 


[ f;  ! ' ! ' I A F i 


! i '.i r i : 1 : 

1 

: 

Nil  ■ ' pei  il  initialization  need 

begin  a ran. 

HNIQ 

; \ j . : ( , ; lul < letermin  wl  thej  ' 1 ■ 1 1 • 1 : • . ■ 

ii  from  card  >r  remoti  terminal  ind  ializ* 

u :ord ing ly.  All  thei  tartuj  function  i . n 1 ir< 

perform*  it  thi  1 ii  • See  Aj  p<  ndix  C. 

INPUT: 

None 

i ru  • 

■ : 

Set  up  INPUTFILE. 

GLOBAL 
DA  A: 

None 

OUTPUT: 

The  output  of  the  INITIAL  module  is  the  ability  to  begin 
measure  processing. 

MODULUS 

CALLED: 

None 

CALLING 
M JDULI  : 

None 

COMMENTS : 

The  exact  function:,  necessary  in  INITIAI  iepend  c • 

1 irt  icul  u impl ement  iti  n v t • m. 

SET  UP INPUTFILE 
COMPLETE  INITIAL 
CONniTlt)NS  AS 
PER  APPENDIX  C. 


i r>i 


READ 


IMBI  : : 

• : 


INPUT: 


FILES 
A I ED: 


■ 


The  I lul<  pi  < • on<  Inpul  request  via  W-JIN 

rh<  inpul  requi  t i /ntaxed . H requi  1 i valid, 

th<  I EA  Lull  el  my  default  (unspi  if  i interna 

val  ui  md  ill  pi  ified  va  . . ; f /ntax  ri  r ir< 

et<  ted,  the  requi  t i ignored  md  il  en  i re;  rted. 


TECHNIQUE: 


REAJ  ill  n ! AN1NPUT  to  obtain  th  f ii  t 
request.  If  the  f irst  word  is  no’  "CH  f " . ■ ... 

in  erroi  3 i • rt  ed  and  t he  requi  t 1 ignored . ; . f 


■jord  of  the 

•ACURF  1 


• hi  • Jefault  valui  ire  et  n for  eithei  a I 


A SURE 


request.  The  maj  i rti  n f READ  i s a ”st  m 

tal  Le-  iri veh  ynt  ix  maly  : l >utine.  Each  requi  t item 
identified  by  its  keyword.  REA]  chi  ks  that  m individual 
ifi  it  ion  ’ pi  /idi  I m • • than  on 


If  one  oi  mori  /ntax  eri  i iri  ' < vi  red,  REA  wi  . rej  ri 

to  thi  isei  v i i E : ■■ . If  possible , thi  remaindei 

of  that  ;•  |u<  :t  will  Le  syntaxed  in  order  to  provi  ie  the 
maximum  ini  >ri  ition  to  the  1 er.  lin  t synt  ixinj  m 

not  be  continued,  REA]  will  si  m thi  inj  il  ••  jui  t (u  ing 
AMIN]  Li  king  foi  the  next  key  word.  yntax  chi  king 
will  resume  wh< m a key  word  i found.  I : i peri  . i 1 md 

(ending  thi  current  r<  . . < t ) , REAI  will  • . . 


When  t Li**  period  ending  th luest  ' mm  1.,  REAI  wi 

exit  if  then  wi  ire  no  yntax  en  >1  u will  . ■ : 1 r t ivei 
(to  prc  e thi  next  requi  t ) if  there  wi sri  jmtax  en 


C m inpu  • r m requi  t ( " IHi  or  "Ml  A ...."> 

contained  on  om  i mori  input  records  (card  - rei 
tran  :m  i union:. ) . 


I NI  rri  LE  (vi  i AN  INI  IT): 
input  request . 


card  ■ ren  ti  1 i • tai  ing 


TEXT  FILE. 

to 

VI  1 i 1 

: y 

■ • nc  e an<  1 va  lid 

DICTFII.E 

to 

V*  'V  \ ‘ 

1 y 

; : ■ • • md  va  1 id 

OUTPUT! 1 

LE: 

to  rej 

:>or 

• • • : : ( V i 1 : : 

type  of 

req 

uc: 

st  l.e 

inr, 

pc-r  r !.  < AN 

here  a ] 

i * 

o 

f the* 

. 

: i ; i ■ • : : i i . ) 

FXA".t  EE! 

ILF 

to  v< 

• i 

fy  pri  enci  ind  va 

CLICHES!' 

1 1.E 

t o v» 

• 

< md  va 

’y- 


l/o:  r 
w I 1 1 


1 M i t v . 


1 5f> 


i 


I A : A : 


OUTPUT: 


MODULES 

CALLED: 


CALLING 

MODULES 


COMMEN1  : 


L.ot..  up: 

K UK TYPE 

DICTi'I  Li:  title  f.  iwii.i,  verify  }.>•••  .eticv 
rEXTt'II  title  ■ - ' . , verify  presei 

EXAMPLE!' ! !,L  tit  It  £ 1 ! * , verity  pri-nonee 

CIiICHEEILE  titl<  £ iia , verify  pre  en  ■ 

STAKTI.  LME , STARTV. ’•'.<!• 

i:m  v >: 1> 

BL'i  : ! nT,  L 1 / :r  'd  Li 
L ! ST  1.0  ' , LISTLIML,  1,1 : ’.T'i'iiX  i’ 

OUTPUT!’ t Lit  media 
MEASURJi  l*| 

MODE 

SAMPLES ' ZE  (only  if  CHECK). 

ABORTPERCENT 

RM  i , NO!  MMAN,  N MTO,  N " , N I M AI  , I HIGH,  READERI  U 

COMMENT 
MAXPARSE 
BLOCKCOUNT 
ENDTYPE 

Th<  ‘output  of  the  READ  module  is  the  initia  v i tl 

global  data  as  described  above. 


S( IAN  i‘  for  next  input  t ken. 

ERROR  to  report  error. 


None 


The  size  of  the  READ  m<  dule  is  dependent  on  the  amount  of 
ert  >r  cheeking  to  ! < perfoi  med . 1 1 wi  11  1 i k<  1 v 1 • ■ 4 second 
large  ' - tu  (ai  • ) : it  will  contain  no  new  or 

difficult  concepts,  it  can  be  built  quickly;  initially  Li  in 
servi  tid  1 t lule  debugging  bul  hou  h ivi ■ 

completi  eri  h(  king  ind  eri  i m<  ig<  when  released  to 
the  non'e  din  i c a 1 u ■ : . 
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valid'key  word 


FURTHER  ITEMS 


ACTION 


CM  USING 


TEXTF I LE.ID 


CM  FROM 


START 


LINE 


ZL#  _ 


>1 


■ WORD 


NONE 


STARTLINE  «—  1 
STARTWORD  « — 1 

STARTLINE  *- » 
STARTWORD  — 1 

>1  STARTWORD  - 


CM 


THRU 


-»■  END 


1_».LINE 


3 


WORD 


1# 

NONE 


>1 


END  TYPE  •—  THRUEND 
ENDLINE  •—  CD 
ENDWORD  ♦—  ao 

ENDLINE  • # 

ENDWORD  «-1 

ENDWORD  •—  a 


CM  FOR. 


# 1 

*.  BLOCKS 
♦ NONE 


ENDTYPE  <•-  COUNTEND 
BLOCKCOUNT  * — a 


CM  DICTIONARY  _ FILE  ID 


DICTFILE  ID 


* "C"  VALID  FOR  CHECK;  NOT  VALID  FOR  MEASURE 

♦ NONE  NONE  OF  THE  ABOVE 


I 58 


CM  BLOCK 


MARKS 


> 1 

SENTENCES 

WORDS 

NONE 


BLOCKTYPE  «— 
BLOCKSIZE  ♦- 
BLOCKTYPE  «- 
BLOCK  TYPE 
BLOCKTYPE  «•- 


CHAR. 

SENTENCES 

WORDS 

WORDS 


CM  LIST 


PRINTER 

TERMINAL 


LISTLOC  ■«—  PRINTER 
LISTLOC  •*-  REMOTE 


M 


TEXT 


WITH 


NONE 


NUMBERS 


LISTTEXT  «—  TRUE 
LISTLINE  «-  TRUE 


M RGL 


SETCORRES.  MEASURE  M 


M 


COMPREHENSIBILITY 


— p.  MEASURE 


1 


SUMMARY 

E SUMMARY 
DETAIL 
NONE 


SET  CORRESPONDING 
MEASURE  |.l 


M < TEST  NAME  > 


SET  CORRES  MEASURE  |.] 
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CM  8ATCH  MODE  3ATCH 


CM  INTERACTIVE 

MODE  — INTERACTIVE 

C EVERY 

— ► - n 

SAMPLESIZE  _ = 

WORDS 

[►none 

—►ON *■ 

ARDRT  PFRPPNlT  -* 

— *•  NONE 

M EXAMPLE 

FILE  ID 

EXAMPLE  FILE. ID  *•—  FILEID 

M CLICHE 

FILE  ID 

»■  CLICHESFILE.ID  — FILEID 

M PARSE  * 

^ MAX 

'1 „# 

MAXPARSE  - 

CM  COMMENT  <QUOTED  STRING  > • COMMENT  < QUOTED  STRING  > 


M NORM 


~L1 

► MANUALS  ► 

to 

^ TO  r 

on"  m 

► OVERALL  » 

► NONE 

NORMOVERALL 

FALSE 

NORMMAN 

*_ 

TRUE 

NORMCDC 

TRUE 

NORMTO 

*— 

TRUE 

NOR MSG 

♦- 

TRUE 

NORMOVERALL 

TRUE 

CM 

M 


READFR 


HIGH 

HIGHLOW 


READERHIGH 
READERHILOW 
READERHIGH 
READERHILOW 
RF  ADERLOW 


RE ADERLOW 
♦-  FALSE 
*-  TRUE 
«-  TRUE 
•_  TRUE 


1 : o 


t 


► LOW_ 
►NONE 


NAME  : 

NUMBER: 

PURPOI 

TECHNIQUE: 


INPUT: 

FILES 

ACCESSED: 

GLOBAL 

DATA: 


OUTPUT : 

MODULES 

CALLED: 

CALLING 
MOI ULES; 

COMMENT: 


SCAN 

3 


The  SCAN  module  reads  the  text  fil<  in 1 find  tl  next  block 
to  process . SCAN  also  set!  ij  global  item  n iry 

to  process  a new  block. 

When  SCAN  is  called  the  first  time  (BLOCKNI  MBER=  0),  S<  IN 
reads  the  text  file  looking  i or  word  STARTW  '•  on  lint 
STARTLINE.  For  all  other  calls  (BLOCKNUMBER  > 0),  the 
text  file  is  already  positioned  Immediately  before  the  first 
word  of  the  next  block.  In  either  case,  all  block- level 
global  values  are  set  up  for  a new  block  and  BLOCKNUMBER 
is  increased  by  one. 

None 


TEXTFILE 


uses  and  updates  BLOCKNUMBER 
uses  STARTWORD  6 STARTLINE 
sets  up : 

NDWB  «-  0,  TNWB  «-  0,  NAMB  «-  0,  NASB  < 
NSWB  <-  0,  TNSE  •«-  0,  TNMB  +■  0,  NPPB 
TNCB  f 0,  N0RB  + 0,  TNEB  «-  0,  TC LB  -t 
NDNSAVE  t-  0,  NDWSAVE  0. 


0,  NSNB  ' 0, 
< 0,  NWNDB  « 
0,  OSWB  < 0, 


NDNB  •<-  0, 
0,  TPSB  < 
TSCB  t 0, 


0, 


SCAN  sets  up  the  system  for  the  next  text  block 


ERROR  to  report  error. 


None 

The  actual  Lmplementati  n oi  the  "find  fii  t text  bl< 
part  of  SCAN  will  depend  on  the  form  oi  the  *•  xt  file.  It 
is  assumed  that  each  line  is  in  a separate  logical ‘record 
and  that  logical  records  are  number)  ' ■■  ' : it  1. 


1 fi  1 


' : 


TI  UK  tQUI 


INi  ■ : : 


'j'Yi';’-  RE 


SC  All  TUI  i n . tly  a ess<  isei 

i n j ■ . i ■ I ■ AN  I N I ' tiled , it  return  t h<  n ext 

. . ' th<  i ei  in]  . • . When  v • m es!  try  it  will  i 
the  in:  ut  reo  rj  (cud  or 


The  exa  : r th  - : . ANINP  ' dependent  n th< 

particular  1 in ■ . . ■ en.  A t ken  in  the  isei  inj  n i 

defined  a.;  one  of: 


iti  Lng  id'  lettei  termii  it<  by  a blank  or  end  of  input 
re  >rd  or  a n n letter.  ( A Lettei  is  A , B ,C , D ,E, . . . , Z. ) 
Thin  type  o'  token  is  call-  : a word. 

a string  of  ligit:  ( 1,1,2,. ..,9)  tei  inati  by  ;j  blank 
or  end  of  input  record  or  a non-digit.  This  type  of 
token  is  called  a nun!  • . 

a single  charactei  which  ’ n t i letter,  ligit , or- 
blank,  called  a special  symbol. 

a < file  id  > (in  the  Fori  pi  if  led  : . th<  ] irti  ilai 
implementation),  called  a file  id. 


SCANRFCORD  - current  input  record. 

SC  AN!  01NTE1  current  pla  • in  )ANI 1 

SCAN  >AME1  KEH  - ii  Tl  UE,  SC  AN  INPUT  will  i t it  thi 

next  t : ■ but  w ill  ri  t t hi  ;ami  t ken. 


INPUTFIEE. 


: ANRI  (changi  1 if  a new  input  r<  n< 

SCANPOINTI R (up  lated  unli  S IAN  AME’I  N i rRUE). 
SCANINTYPE  set  up. 

SCAN INVALUE  set  up  if  token  is  a numb<  t . 

SCAN-  ■AMETOKFN  < I'A!/  i . 

SCAN  INTOKEN  set  up. 


' . • ; . • AKIN  ' at  ■ froi  th<  i - input, 

token  itself  is  - i k in  JAN  NT  EN.  t . . 

in  ! IAN  INTYPI  md  ' il  » returi  i thi  . 1 tl 

module.  If  the  token  is  a number,  ANINV/  ntaii 

integral  valui  repi'i  nted  by  tl  t . i examj  , if 
SCAN  INTOKEN  is  "01  "then  !AN 1NVA  . 


il  i 


i 


LE 
A l LI  : 


N 


CALLING 
::  I LK  : 

COMMENTS : 


kf:ad 

SEARCH 

rn  general  , ! ANINI  . is  m easy  inoduJ 
: icul  ty  can  b<  inti  luced  by  thi 
or  syst<  m.  . :AKIH1  1 in  a!  print  < i< 


to  write, 
cc  o I J Ui 
h input  ] iii. 


SCANINTYPF 


YES 


\TOK!  MISt OF 


SCANINTYPE  TOKEM  ISWORD 
PLACE  WORD  (TO  NEXT  NON  LETTER) 

IN  SCANIN  TOKEN. 

UP  SCANPOINTER  TO  NEXT  NON-LETTER 


SCANINTYPE  TOKENISN UMBER. 
PLACE  NUMBER  (TO  NEXT  NON  DIGIT) 
INTO  SCANINTOKEN. 

UP  SCANPOINTER  TO  NEXT  NON  DIGIT 
PLACE  VALUE  OF  NUMBER  INTO 
SCANINVALUE. 


SCANINTYPE  TOKENISFILEID 
PLACE  <FILE-ID>  IN  SCANINTOKEN 
UP  SCANPOINTER  PAST*-<FILE  ID> 


I £MMUST  BE  SPECIAL  SYMBOL) 


SCANINTYPE  ♦—  TOKEN ISSYMBOL 
PLACE  SINGLE  CHARACTER  INTO 
SCANIN  TOKEN. 

UP  SCANPOINTER  TO  NEXT  CHARACTER 


MAM  : 


RESET 


N " : 

: 

T:  MHMi  MM 
FILES 

. : 

(GLOBAL 

DATA: 


OUTPUT: 

M DULE 
.' AIMED: 

A l LIM  • 

: 


6 


The  RESET  module  ret.;  up  all  MM  «: 
a n • • 


iten  n<  < 


entenct  aitei  irt  r«  et  ■ 
r . ; a l I • • -m  ar-  !<•  «r>  : • 


M ■ 


M 


• ts  up: 

TMM?  - :M  !-  + 1 
MPI  • 

Mr  • 1 

: mw  - o 

y D-';  * o 

T . • 0 

MM;  . * 0 
NCR.  * 0 
K ‘0 
I * 0 

»5  thi  ystet 


• • • ...  ■.■>;•  :• 


N n< 


None 


UiV 


Ldt< 


1 (ip 


N AM  I : : 

NUMBER: 

PURPOSE: 

TECHNIQUE: 


INPUT: 

FILES 
ACCI  D : 


SEARCH 

7 

The  SEARCH  module  looks  ip  a sinf  Le  texl  word  ii  th< 
dictionary  'and  adds  the  di  ti  nary  ini  rmati  • that 
word  to  the  Sentence  Data  Array. 

SEARCH  adds  one  to  TNWS  (total  number  of  words  per  sentence) 
and  uses  TNWS  to  index  the  Sentence  Dat  i Array,  i finds 

the  next  text  word  and  places  it  in  WORI  [TNWS].  It  then 
searches  the  dictionary  for  the  text  word.  Th<  exa  t i rm  of 
the  search  will  depend  on  th<  dm  li  1 f the  ii  ti  nary. 

If  the  word  is  in  the  dictionary  file,  all  in format i i will 
be  added  to  the  entry  in  Sentence  Data  Array. 

If  the  word  is  not  in  the  dictionary,  the  a ti  n taken  wi] 
depend  on  the  mode.  If  batch,  the  Sentence  Data  Array  entry 
will  be  marked  as  containing  an  unknown  word.  The  various 
items  will  be  set  to  their  default  values: 

NOMORE  ■*-  1 
NOPARTS  0 
PART [••■•]  all  0 

NOSYLLABLES  <-  [#  letters/3]  t 1 

NEGIND  ■*-  no 

SYMBOLIND  < no 

NOWORDS  1 

NOREFS  1 

EXAMPLE  <-  no 

CLICHE  no 

If  the  run  is  in  the  interactive  inode  and  the  word  is  not 
in  the  di  ■ i >nary,  the  user  will  be  asked  to  supply  the 
necessary  information.  If  the  user  supplies  the  informa- 
tion, the  supplied  information  will  be  used  and  saved  in 
the  dictionary  for  possible  later  use.  If  the  user  choos<  : 
not  to  supply  the  information,  the  various  items  will  be 
a:  igned  theii  default  valu<  is  lefined  al  ve. 

-Next  text  word. 


text  file 
dictionary  file 
c 1 ichor;  file 
example  file 

input  file  (only  if  not  in  dictionary) 


GLOBAL 
: A A: 


OUTPUT: 

MODULES 
AA  LED: 


...  , . • ,iv  ,-i  ;ie  entry  to  t n • - -t.*enc--  Da*  a n:  : - 

■ : w ■ . not  in  U t i nary,  if  iat<  NViSDB  and  N A-  ■ 

: V H ><  • It  ne  entry  t th  enten  e Data  Art  »y. 

\ if  word  not  in  t ionary  ••  • 
i L.  interactive 


TALLIN  ’> 
MOI  LES: 

3MKI  NT 


•,  • i. . ; fv  'H  i . directly 

; mj  . • ■ ntation  It  

Ly : 

1.  dict  ionary  *.  rm  . „ 

xity  : "w  i i-n  >t-m-di  t nary 

h.  entire  Pt  em  f intera  tiv,  il  ti  nary  ipdat.  ight 
r the  first  version.  When  i ■ idde  • 
raments  for  th<  modul< 

. . : , • . , • ;;;AR'  ii  will  f-ad  it  an  : 

1:  i -w  t'  xt  ; ■ r • - ■ ’ 

. ...  • . > • 

write  t:  f,x  ■ ■ * • 
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NAME : 


col::'; 


NUMBER: 

PURPOSE: 

TECHNIQUE: 


INPUT: 

FILES 

ACCESSED: 

GLOBAL, 

DATA: 

OUTPUT: 

MODULES 

CALLED: 

CALLING 
MODULES : 


8 

The  COUNT  module  maintains  ■ nnii  ml  f ■ sentence  and 
block  summaries. 


COUNT  updates 

NDWB  (different  words  in  block) 

TNWB  (number  words  in  block) 

TNMB  (number  morphemes  in  block) 

NAMB  (number  multi-word  abbreviation  in  block) 
NASB  (number  single  word  abbreviations  in  block) 
NDNB  (number  of  different  nouns  in  block) 

TPSB  (number  of  parts  of  speech  in  block) 

TNEB  (number  of  elucidations) 

NORB  (right  branching) 

TSCB  (number  of  syllables) 

OSWB  (number  of  one  syllable  words) 

TNCB  (number  of  characters) 

TCLB  (number  of  clauses) 

NSWB  (number  of  total  abbreviations) 


and  adds  the  next  entry  in  the  BLOCKWORD  table;  updates 
BLOCKWORD  table  for  words  not  in  dictionary. 


Current  Sentence  Data  Array  entry  (as  defined  by  TNWS). 


None 


1 NI  WB  , i NWB  , 1 NMJ  , NAMi  , NA  B , Ni  N , T1  . 
) 5WB,  TN  B , T<  LB,  N WI 
Sen i en<  • ■ 1 it  . An  iy,entr^ 

Blockword  table. 


TNEB,  NORB,  TCSB, 


Upda t ed  count ers . 


None 


None 


I 7 1 


COMMENTS: 


None 


NAME: 


PARSE  • 


NUMBER: 
PURPOSE : 

TECHNIQUE: 


9 

The  PARSE  module  is  intended  to  parse  fully  each  sentence 
present:  ' to  it.  It  attempts  to  reduce  to  a minimum  th< 
number  of  potential  parse-trees  and  producer,  a representation 
oi  parse-tree  for  each  of  the  possible  parses  of  the  sentence. 

The  PARSE  module  consists  of  a parser  and  a prej  irser . It 
makes  every  possible  assignment  of  categories  (parts  of  speech) 
to  each  word  of  the  s<  ntence.  Each  as:  ignmi  nt  is  j ed 
to  the  preparser,  which  either  eliminates  it  or  imposes  on 
it  a preliminary  structure. 

The  parser,  by  means  of  repetitive  applications  of  parsing 
rules  (Appendix  I)  derives  a single  tree  structure  for  each 
possible  reading  of  the  sentence.  The  tree  structure  is 
embodied  in  a PARSE-TABLE. 

The  parser  maintains  an  array  called  SUBSTRING,  whose 
elements  are  the  categories  at  the  top  of  the  portion  of  the 
tree  at  any  given  state  of  completion.  It  attempts  to  match 
the  categories  at  the  left  of  SUBSTRING  with  the  right  sides 
of  the  parsing  rules.  When  a match  is  found,  the  left  side 
of  the  parsing  rule  is  entered  into  the  PARSE-TABLE,  indicat- 
ing that  the  tree  has  been  extended  upward. 

When  at  a given  point  a match  is  impossible,  a back-tracking 
routine  deletes  the  last-entered  category  from  PARSE-TABLE 
and  resumes  the  attempt  to  find  a match,  starting  with  the 
rule  immediately  following  the  one  (in  the  list  of  parsing 
rules)  that  was  used  in  the  entry  of  the  newly-deleted  rule. 

A similar  technique  is  used  in  order  to  find  additional 
parses  of  the  same  sentence  using  the  same  sequence  of  cate- 
gories. A copy  of  PARSE-TABLE,  which  is  complete  for  a 
given  parse  is  made,  and  deletions  and  additions  to  this 
copy  are  made  to  get  additional  parses.  The  process  is 
repeated  until  no  additional  parses  are  possible,  or  until 
the  maximum  number  of  parses  (see  PARSE  LIMIT  SPEC,  Appendix 
C). 

GET -SUBSTRING  finds  the  arrays  SUBSTRING  and  SUBSTRING-LINES. 

If  PIRST-LINE  is.  not  1,  then  only  the  categories  from  the 
left-most  point  are  desired,  so  they  cab  be  erased  in 
backtracking. 
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FOLLOW-PATH  starts  with  the  word  I of  the  current 

sentence,  and  goes  upward  from  there  through  the  PARSE- 
TABLE.  SIGNAL  is  set  to  1 if  this  is  the  left-most 
path  to  its  highest  point,  and  to  -1  other-wise.  CAT  and 
LIME  are  the  category  at  this  highest  point  and  the  line 
is  PARSE-TABLE  on  which  it  is  entered. 

BEGIN-RULES  tries  to  find  a match  for  the  string  SUBSTRING. 

It  begins  with  the  rule  numbered  FIRST-RULE  and  goes 
through  each  rule  from  that  point  on.  If  a rule  has  a 
length  LENGTH(I),  then  the  first  LENGTH(I)  symbols  from 
SUBSTRING  are  used.  The  procedure  calls  ENTER  when  a 
match  is  found,  and  BAD-MATCH  when  no  match  is  found. 

ENTER  begins  a new  line  in  PARSE-TABLE  and  places  pointers 
on  previous  lines  which  correspond  to  categories  which 
lead  upward  to  the  new  category.  It  also  places  a 1 
in  column  3 of  the  line  of  the  left-most  of  these 
categories,  and  -1  in  column  3 of  the  others.  On  the  new 
line,  it  enters  the  category  in  column  1 , 0 in  columns 
2 and  3,  and  the  number  of  the  rule  just  used  in  column  4. 

GOOD-MATCH  checks  to  see  if  the  tree  has  been  completed 
without  arriving  at  S.  If  this  happens,  BACKTRACK  is 
called.  If  the  tree  is  completed  with  S,  then,  FINISHED- 
PARSE  is  called.  Otherwise,  parsing  is  resumed  by  getting 
a new  SUBSTRING  and  calling  BEGIN-RULES. 

FINISHED-PARSE  is  called  each  time  a parse  has  been  found. 

BAD-MATCH  drops  the  first  element  of  SUBSTRING,  if  this  is 
possible,  and  calls  BEGIN-RULES  to  look  for  new  matches, 
if  SUBSTRING  has  only  1 element,  BACKTRACK  is  called. 

BACKTRACT  erases  all  references  in  PARSE-TABLE  to  its  last 
line,  and  erases  the  last  Line  itself.  It  then  prej  ires 
to  look  for  a new  match  at  the  point  where  the  last  one 
was  found,  but  starting  with  the  next  rule  in  the  list. 

If  there  is  noting  to  erase,  NO-PARSE  is  called. 

MAIN  is  the  main  procedure. 

NO-PARSE  it;  called  when  no  parse  can  be  found  for  the 
given  assignments  of  categories  to  the  words  of  the 
sentence . 
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INPUT: 


FILES 

ACCESSED: 


OUTPUT: 


MODULES 

CALLED: 

CALLING 

MODULES: 


The  output  is  the  Sentence  Data  Array  from  the  SEARCH 
module . 


SUBSTRING  - an  array  whose  elements  are  the  categories  at 
the'  top  of  the  current  sub-tree 

SUBSTRING-LINES  - An  array  whose  elements  are  the  numbers 
of  the  lines  of  PARSE-TABLE  which  correspond  to  the 
elements  of  SUBSTRING 

SUBSTRING-LENGTH  - the  number  ol  elements  in  SUBSTRING 

NUMWDS  - the  number  of  words  in  the  current  sentence 

NUMRULES  - the  number  of  context-free  rules 

RULE  - is  a two  dimensional  array  which  has  a row  for  each 
rule,  and  each  row  has  as  many  elements  as  there  are 
categories  in  that  rule.  For  example,  the  row  cor- 
responding to  S <-NP  VP  has  two  elements. 

PARSE-TABLE  - is  a two  dimensional  array  which  is  a record 
of  the  developing  parse 

PARSE-LINE  - is  the  line  of  the  PARSE-TABLE  in  current  use 

LENGTH  - an  array  whose  ith  element  is  the  number  of  categoric 
in  the  ith  rule. 

LEFT  - an  array  whose  ith  element  is  the  category  to  the 
left  of  the  arrow  in  the  ith  rule 

NUMASS  - the  number  of  possible  combinations  of  assignments 
of  categories  to  the  words  of  the  sentence. 

The  PARSE  module  produces  a copy  of  the  PARSE-TABLE  array  for 
each  possible  parse  of  the  sentence  and,  if  possible, 
a sentence  diagram. 


None 


None 


PREPARE  R 


The  preparser  has  two  goals: 

1.  To  eliminate  certain  impossible  •.<•  |u*-n-  • 
categories  entry  into  the  par'ser. 

2.  To  apply  a limited  number  oi  frequently  i;>;  : Icuble 
context-sensitive  parsing  rules,  thereby  giving  a 
preliminary  structure  to  the:  sentence.  Thi  pre- 
liminary structure  is  not  subsequently  altered  by 
the  parser. 

In  order  to  accomplish  the  first  goal,  every  member  of  a list  of  impossible 
categories  is  compared  whenever  a new  sequence  of  categories  comes  under 
consideration.  If  at  any  point  there  is  a match,  the  entire  current 
sequence  is  eliminated  and  the  next  assignment  of  categories  to  the  words 
of  the  sentence  is  made. 

To  accomplish  the  second  goal,  the  list  of  parsing  rules  is  compared  in 
turn  with  the  current  sequence  of  categories.  When  a match  is  found,  the 
new  left  category  is  entered  into  PARSE-TABLE  in  the  same  manner  as  this 
entry  is  performed  by  the  parser. 
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.//\RE\. 

THERE 

MORE  ASSIGNMENTS' 
\OF  CATEGORIES 


MAKE  NEW  ASSIGNMENT 
OF  CATEGORIES  TO 
WORDS  OF  SENTENCE 


PRE  PARSE 


ASSIGNMENT 

POSSIBLE 


START  WITH  RULE  R; 
FIND  EARLIEST  RULE 
WHICH  MATCHES  LEFT 
END  OF  SUBSTRING 


/ IS  THERE  ^ 
SUCH  A RULE 


!S 

LENGTH  OF 
^SUBSTRING  >1i-‘ 


DELETE  FIRST 
ELEMENT  OF 
SUBSTRING 


ENTER  NEW  CATEGORY 
AND  RULE  IN 
PARSE-TABLE 


IS  NEW  \ 
CATEGORY  - "S" 


/ more  ^ 

ROWS  OF 
PARSE-TABLE 
NTO  DELETE  . 


IS  TREE 
.COMPLETE  ? 


R ■*—  RULE  OF 
LAST  ENTRY  IN 
PARSE-TABLE 


DELETE  LAST  ROW 
OF  PARSE-TABLE; 

FORM  SUBSTRING 


MAKE  COPY  OF 
PARSE-TABLE 


NAME: 


MEASURE/SI 


NUMBER: 

PURPOSE: 

TECHNIQUE: 


INPUT: 

FILES 

ACCESED: 

GLOBAL 

DATA: 


OUTPUT: 


10 

The  MEASURE/SI  module  calculates  the  six  structure  of  in- 
tellect measures. 

All  input  required  to  compute  the  structure  of  intellect 
measures  has  been  previously  obtained.  The  actual  measures 
are  calculated: 

CMU  + 1 - NDWB/TNWB 

CMR  +■  (NSNB/  (TNSB-  1)  + NORB  )/TNWB 

MMU  -*•  1 -( NDNB/TNWB) 

ESI  «-  1 -( NSWB/TNWB) 

NMI  (TNWB/TPSB) 

DMU  «-  TNEB/TNSB 

None 


None 


Uses  : 

NDWB 

TNWB 

NSNB 

TNSB 

NQRB 

NDNB 

NSWB 

NPPB 

TNEB 

The  output  of  MEASURE/SI  is  the  six  nonnormalized  structure  of 
intellect  measure.:. 
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NAME : 


MEASURE/P 


NUMBER:  11 

PURPOSE:  The  MEASURE/P  module  calculates  the  seven  psycholinguistic 

comprehensibility  measures.  The  submodules  which  each 
compute  measure  are  individually  described  as  numbers  HA  - 
11G  below: 


NAME:  YNGVE  DEPTH  (YD) 

NUMBER:  11A 

PURPOSE:  The  purpose  of  this  subroutine  is  to  compute  the  Yngve 

depth  of  each  sentence  in  a block. 

TECHNIQUE:  Each  path  from  the  top-most  symbol  down  to  the  words  in  the 

sentence  is  examined , and  the  Y1  . ij  ng  all  paths  summed, 
Yngve  depth  of  a sentence  is  the  same  divided  by  the  number 
of  words  in  the  sentence. 

INPUT:  None 

FILES 

ACCESSED:  Cliche 

GLOBAL 

DATA:  PARSE  TABLE 

NUMWDS 

OUTPUT:  Output  of  the  sub-module  is  the  nonnormalized  YD  value. 

MODULES 

CALLED:  None 


CALLING 

MODULES:  None 


NAME:  MORPHEME  DEPTH  (MD) 

NUMBER:  11B 

PURPOSE:  Calculation  of  the  psycholinguistic  measure  MI  . 

TECHNIQUE:  The  number  of  morphemes  in  the  sentence  is  divided  :v  t:.* 

number  of  words. 

INPUT:  None 

FILES 

ACCESSED:  Cliche 

GLOBAL 

| DATA:  Sentence  Data  Array 

NUMWDS 

OUTPUT:  Output  of  the  sub-module  is  the  nonnormalized  MD  value. 

MODULES 

[ CALLED:  None 

CALLING 

MODULES:  None 

I 
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nak:  transformational  complexity  (TC) 

NUMBER:  11C 

PURPOSE:  The  purpose  of  the  submodule  is  the  calculation  of  the  psycho- 

linguist i-c  measure  TC. 

TECHNIQUE:  Certain  of  the  rules  will  be  listed  as  having  one  of  the 

features  "passive,"  "active-negative,"  "passive-negative." 

For  each  parse,  search  the  rules  used  (column  4 of  PARSE- 
TABLE)  and  if  any  of  the  features  is  present,  TC  is  .95,  .75, 
or  .20  respectively.  Otherwise,  TC=  1. 

INPUT:  None 

FILES 

ACCESSED:  File  of  parsing  rules 

GLOBAL 

DATA:  PARSE  TABLE 

OUTPUT:  The  output  is  the  nonnormalized  TC  value. 

MODULES 

CALLED:  None 

CALLING 

MODULES:  None 
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NAME: 


CENTER  EMBEDDING  (CE) 


NUMBER:  11D 

PURPOSE:  The  purpose  of  this  submodule  is  to  calculate  the  value 

of  ^NPS,  required  for  the  subsequent  determination  of  CE. 

TECHNIQUE:  The  subject  noun  phrase  of  the  sentence  is  located,  and 

the  number  of  phrases  branching  off  to  the  right  from  this 
phrase  is  determined. 

INPUT:  None 


I 

i 


FILES 

ACCESSED: 

None 

GLOBAL 

DATA: 

PARSE  TABLE 
NUMWDS 

OUTPUT: 

NNPS 

MODULES 
CALLED : 

None 

CALLING 

MODULES: 

None 
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NAME:  LEFT  BRANCHING  (LB) 

NUMBER:  HE 

PURPOSE:  The  submodules  purpose  is  Ho  compute  the  value  of  NCLS, 

required  for  computation  of  LB. 

TECHNIQUE:  The  subject  noun  phrase  of  the  sentence  is  located,  and 

the  number  of  phrases  branching  off  to  the  left  from  this 
phrase  is  determined. 


INPUT: 

None 

FILES 

ACCESSED: 

None 

OUTPUT: 

NCLS 

MODULES 

CALLED: 

None 

CALLING 

MODULES: 

None 
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NAME : 


RIGHT  BRANCHING  (RB) 


* 

c 


NUMBER: 


11F 


PURPOSE: 


The  purpose  of  the  submodule  is  to  compute  the  value  or 
NCRS  required  for  the  computation  of  RB. 


TECHNIQUE: 


The  object  noun  phrase  of  the  sentence  is  located.  The 
number  of  phrases  branching  to  the  right  from  the  object 
noun  phrase  is  determined. 


INPUT:  None 


FILES 

ACCESSED:  None 

GLOBAL 

DATA:  Uses: 

PARSE  TABLE 


OUTPUT:  NCRS 

MODULES 

CALLED:  None 

CALLING 

MODULES:  None 
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NAME:  DELETED  COMPLEMENT  (DC) 


NUMBER:  11G 

PURPOSE:  The  purpose  of  the  submodule  is  calculation  of  the 

psycholinguistic  measure  DC  of  a sentence. 

TECHNIQUE:  Certain  of  the  rules  will  be  listed  as  having  a deleted 

complement  feature.  For  each  parse , the  rules  are  searched 
(column  4 of  PARSE  TABLE)  and  if  any  rule  used  has  this 
feature  then  DC11  0.  Otherwise,  DC=  1. 

INPUT:  None 

FILES 

ACCESSED:  File  of  parsing  rules:  RULE 

GLOBAL 

DATA:  Uses: 

PARSE  TABLE 

OUTPUT:  The  output  is  the  DC  value  of  the  examined  sentence. 

MODULES 

CALLED:  None 

CALLING 

MODULES:  None 
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NAME: 

NUMBER: 

PURPOSE: 

TECHNIQUE: 

INPUT: 

FILES 

ACCESSED: 

GLOBAL 

DATA: 

OUTPUT: 

MODULES 

CALLED: 

CALLING 

MODULES: 

COMMENT: 


SENTSUM 

12 

The  SENTSUM  module  cummulates  and  summarizes  the  results  of 
a single  sentence. 

SENTSUM  updates  all  block  level  statistics  for  each  sentence 
that  has  not  been  updated  previously. 

None 


None 

TNSB  updated,  NCRB , NCLB , NCRS,  NCLS. 

The  output  of  the  SENTSUM  module  is  the  updated  global  items. 
None 


None 

The  work  of  SENTSUM  Is  distributed  in  other  modules,  primarily 
COUNT. 

NOTE:  NSNB  is  calculated  at  the  end  of  a block,  not  at  the 

end  of  each  sentence. 
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NAME: 


SENTOUT 


NUMBER: 

PURPOSE: 

TECHNIQUE: 

INPUT: 

FILES 

ACCESSED: 

GLOBAL 

DATA: 


OUTPUT: 

MODULES 

CALLED: 

CALLING 

MODULES: 

COMMENTS: 


13 

The  SENTOUT  lists  the  result  of  processing  each  sentence  in 
a block. 

SENTOUT  is  a report  generator  using  values  previously  calculated. 
None 


OUTPUT  FILE 


Uses:  (for  each  sentence) 

TNSB 

TNWS 

NEPS 

NESS 

YDS 

TCS 

NNPS 

NCRS 

NCLS 

DCS 

For  header  line: 

BLOCKNUMBER 
TEXTFILE  name 

The  output  from  SENTOUT  is  a report  on  each  sentence  in  a 
block . 


None 


None 

Items  used  only  by  SENTOUT  but  required  to  be  saved  between 
calls  are: 

NPST 

NEST 

YDT 

TCT 
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NCLT 

DCT 

Each  item  is  the  total  of  the  corresponding  item  with  the 
final  "T"  replaced  by  "S.‘" 


NAME:  RGL 

NUMBER:  14 

PURPOSE:  The  RGL  module  calculates  the  reading,  grade  level  for  a 

block  of  text. 

TECHNIQUE:  The  reading  level  is  calculated  by  three  different 

formulas.  The  selected  formulas  are: 

1.  FORECAST  [Caylor,  Sticht,  Fox,  & Ford;  1972j 
FORECAST  RGL=  20  - |0SWB  x (150/TNWB)| 

2.  ARI  ( Automatic  Readability  Index)  RGL 
[SMITH  6 SENTER;  1966] 

tnwb\  / TNCB  \ 

TNSB/+ 4-71ItNWb|  - 21.43 


ARIRGL=  0 . 5 x 
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GLOBAL 


DATA: 

TNWB 

total  number 

words  in  block 

OSWB 

number  of  one 

: syllable  words  in  block 

TNSB 

total  number 

of  sentences  in  block 

TNCB 

total  number 

of  characters  (letters)  in  b] 

TSCB 

total  number 

yllables  in  block 

: : 

The, 

output  of  the 

...  : ' i Lng  »rad< 

and 

•: 

FORCASTRGL 

ARIRGL 

FLESCHRL 

ARI 

MODULES 

CALLED:  None 

CALLING 

MODULES:  None 

COMMENTS:  The  RGL  and  RGLOUT  modules  can  be  combined. 


levels 
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NAME: 

NUMBER: 


RGLOUT 

15 


PURPOSE:  RGLOUT  displays  the  reading  grade  level  for  each  block. 

TECHNIQUE:  RGLOUT  is  a report  generator  using  values  previously 

calculated . 

INPUT:  None 

FILES 

ACCESSED:  OUTPUT  FILE 

GLOBAL 

DATA:  Uses  three  RGL  values  and  indexes  as  calculated  by  RGL. 

OUTPUT:  The  output  of  RGLOUT  is  a report  on  the  block's  reading  grade 

level . 

MODULES 

CALLED:  None 


CALLING 

MODULES:  None 


NAME: 


BLOCKSUM 


NUMBER  16 

PURPOSE:  The  BLOCKSUM  module  performs  the  final  calculation  of  block 

level  values. 

TECHNIQUE:  BLOCKSUM  calculates  NSNB  and  normalizes  all  measures  pre- 

viously calculated.  The  measures  are  normalized  by  table 
interpolation.  The  tables  are  provided  in  Appendix  F. 

For  values  producing  a normalized  measure  less  than  5%,  5% 
will  be  used.  For  values  producing  a normalized  measure 
greater  than  95%,  95%  will  be  used. 

For  values  where  the  supplied  interpolation  tables  reach 
the  extremes  of  0 and/or  1 over  a range  of  values,  the 
value  of  the  corresponding  measure  must  be  set  equal  to 
the  center  of  the  band  of  extreme  values. 

For  example,  with  the  following  table: 

Measure  Normalized  Value 


1.0 

90 

1.0 

80 

1.0 

70 

1.0 

60 

.8 

50 

. 6 

40 

A measure  value  of  1,0  would  yield  a normalized  value  of 


75= 


The  composite  index  formula  depends  on  the  settings  of 
READERHIGH,  READERLOW,  and  READERHILOW.  One  or  more  may 
be  true,  and  for  each  one  true  a composite  index  is  calculated 

< ")MPINDEXHIGH=  - .132  * MMUN  + .171  * ESIN  + .418  * YDN 

t .397  * TCN  + .302  * CMUN  + .089  * MDN  - . 320  * 

DMUN  + .167  * RBN  - .509 
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C0MPINDEXHIL0= 

.132  * MMUN 

+ .164 

* ESIN  + . 

200  A YDN  - .207  ft 

SEN  + . 

,250  * RBN  - 

.151  * 

MDN  - .289 

* NMIN  - .074  * 

LBN  - . 

,003 

C0MPINDEXL0W=  . 

,169  * CMUN 

.173  * 

MW  N + 

* ESIN  + .190  ft  DMUN 

+ .335 

AYDN+  .260 

* T N + 

. 242  * RBN 

- .344 

INPUT: 

FILES 

ACCESSED: 


Results  from  previa.  -,  :u  : ■ 


None 


GLOBAL 

DATA: 

OUTPUT: 


Updates  NSNB  and  calculates  normalized  measure. 

The  output  of  the  BLOCKSUM  module  is  the  normalized  measures 
and  NSNB. 


MODULES' 

CALLED: 


None 


CALLING 

MODULES: 


None 


COMMENTS:  The  norm  data  must  be  included  in  this  medule.  The  BLOCKSUM 

is  a simple  computation  module.  The  exact  interpolation  method 
can  be  determined  later. 


204 


! 


PROGRAM  MODULE  SPECIFICATION 


NAME: 

CHECKOUT 

NUMBER: 

17 

PURPOSE: 

CHECKOUT  displays  the  block  results  of  the  dictionary 
check . 

TECHNIQUE: 

CHECKOUT  is  a report  generator  using  values  previously 
calculated. 

INPUT: 

None 

FILES 

ACCESSED: 

None 

GLOBAL 

DATA: 

Uses  PLOCKWORD  TABLE  and  TNWR  and  NOTINDICT. 

OUTPUT: 

The  output  of  CHECKOUT  is  a report  on  the  results  of  the 
dictionary  check. 

MODULES 

CALLED : 

None 

CALLING 

MODULES: 

None 

COMMENTS: 

None 
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NAME: 


MEASUREOUT 


NUMBER:  18 

PURPOSE:  MEASUREOUT  displays  the  mea:  ire  il  ..  ited  i ea  h 

block . 


TECHNIQUE: 

INPUT: 

FILES 

ACCESSED: 

GLOBAL 

DATA: 


MEASUREOUT  is  a report  genera toi 
calculated. 

None 


. Lng  valu<  previ  . Ly 


OUTPUT  FILES 

Uses  all  measures  previously  calculated. 


OUTPUT:  The  output  of  MEASUREOUT  is  a report  on  the  block's  measures. 

MODULES 

CALLED:  None 


CALLING 

MODULES:  None 

COMMENTS:  MEASUREOUT  is  a trivial  report  genei  il  r . 
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NAME:  RUNSUM 

NUMBER:  19 

PURPOSE:  The  RUNSUM  module  summariz<  th<  • • ilt  r xn  in  pre- 

paration for  the  RUNOUT  report  m<  :ul>  . 

The' RUNSUM  module  updati  - th<  lictionai  idei  information 
to  reflect  the  current  run. 

TECHNIQUE:  RUNSUM  takes  totals  calculated  by  BLO  M it  th.  end 

of  each  block  and  computes  average?.. 

INPUT:  Totals  from  BLOCKSUM. 

FILES 

ACCESSED:  None 

GLOBAL 

DATA:  Totals  from  BLOCKSUM. 

OUTPUT:  The  output  of  the  RUNSUM  module  is  a set  iv<  i igi  t be 

used  in  the  RUNOUT  module. 

MODULES 

CALLED:  None 

CALLING 

MODULES:  None 

COMMENTS:  The  RUNSUM  module  is  a simple  calculation  module. 


NAME: 


RUNOUT 


NUMBER: 

20 

1 1 USE: 

RUN  ' S'  I : lay  t!>  f inal  run  results. 

TECHNIQUE: 

RUNOUT  is  a report  gen  • it  u ising  values  previously 
calculated . 

INPUT: 

None 

FILES 

ACCESSED: 

OUTPUT  FILE 

GLOBAL 

DATA: 

Uses  data  calculated  at  end  of  ea  h 1 Lock. 

OUTPUT: 

The  output  of  RUNOUT  5 i final  l in  report.  See  Figure 
for  format  and  content. 

D-4 

MODULES 

CALLED: 

None 

CALLING 

MODULES: 

None 

COMMENTS: 

RUNOUT  is  a report  generator  using  values  calculated  at 

the 

conclusion  of  each  block.  In  practice,  RUNOUT  may  be  called 
for  each  block. 


211 


RUN  REQUEST  SYNTAX 


Appendix  C defines  the  format,  content,  and  sequence  for  a user  of 
the  CM  program  to  enter  run  requests  into  the  computer.  This  information, 
termed  run  request  syntax  is  specified  in  the  form  of  a series  of  syntax 
diagrams  which  will  apply  whether  the  input  is  prepared  on  punched  cards 
and  read  via  a local  or  remote  card  reader  or  is  entered  via  a keyboard  on 
a local  or  remote  interactive  computer  terminal. 

The  syntax  diagram  was  selected  to  form  the  basis  for  this  appendix 
because  it  affords  a concise  exposition  of  a syntax  involving  defaults, 
alternatives,  and  iterations;  it  is  rigorous  without  being  cumbersome. 

There  are  few  formal  rules:  The  basic  rule  is  that  any  path  traced  along 

the  forward  direction  of  the  arrows  will  produce  a syntactically  correct 
component  of  the  run  request  language. 

There  are  two  kinds  of  diagram  components:  terminals  6 non-terminals 

Non-terminals  are  items  which  have  their  own  tracks  and  are  indicated  by 
< >.  Terminals  are  items  which  do  not  have  their  own  tracks.  These  are 
either  special  symbols  or  "words."  Words  which  are  entirely  underlined 
must  be  completely  specified.  Words  partially  underlined  can  be  abbreviated 
by  the  underlined  part  or  any  part  of  the  entire  word  containing  at  least 
the  underlined  part.  Thus , -+  SENTENCES  -*■  can  be  represented  by  any  of 

SENT  SENTENCE  SENTEN 

but  SEN  SENTNC  SENTENCE SX 

are  all  unacceptable. 

Iteration  is  noted  by  — — > in  a track  the  horizontal  directed 

line  containing  the  — ^n}  can  be  passed  at  most  n times  (but  can  be  passed 
0 times).  If  an  asterisk  (*)  follows  the  number,  then  the  containing 
directed  horizontal  line  must  be  passed  at  least  once. 

Basically,  two  types  of  requests  are  permissable,  those  requesting 
a check  of  text  against  the  dictionary,  (COMMAND:  CHECK)  and  those 

requesting  a calculation  of  comprehensibility  measures  (COMMAND:  MEASURE). 
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RUN  REQUEST  SYNTAX 


FUKi  USE.  • Initiate:  calcu]  i ' n >1  mprehen  il  ility  measures  for  a 
ified  i irt  i . pec  if  ied  file.  Each  syntax  element  is 
elaborated  on  in  .1  ••  p:-int  err  ions. 

EXAMPLE  • MEASURE  USING  111.1  TWO  PROM  l.l.Nl  1000  THRU  END,  BLOCK  SUM 
50  SI  .NT,  LIST  U)  PRINTER  TEXT,  COMP  SUMMARY,  INTERACTIVE. 


REQUEST  RUN  SYNTAX 


+ CHECK 


PURPOSE  • Initiate  a check  to  determine  whether  or  not  all  words  in  t he 
specified  part  of  the  specified  file  are  contained  in  ‘he 
dictionary. 


Example  • CHECK  USING  FI  LEONE,  FROM  START,  FOR  14  BLOCKS,  BLOCK  SHE 
400  WORDS,  LIST  TO  REM. 


< FILE  ID  > ::=  {name  of  file  & media  information; 


PURPOSE  • Defines  the  name  and  any  necessary  medi  i Lnl  mati  n 
input  file  (text,  dictionary,  etc.) 


COMMENTS 


The  exact  form  of  a < PILE  ID  > will  depend  on  the  ; irt i ilai 
implementation  system  and  language. 
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< TEXT  SPEC  > : : = 

-V  USING  -*■  < FILE  11)  > *■  // 

PURPOSE  • Defines  the  input  text  to  be  processed. 

COMMENTS  • The  text  file  will  consist  of  fixed-length  blocks  containing 
fixed-length  records.  The  file  will  consist  of  two  parts: 
header  and  text.  The  header  portion  will  consist  of  informa- 
tion about  the  text  file  including  text  title  and  author, 
classification,  creation  and  revision  dates  and  versions.  The 
text  portion  will  contain  the  actual  text,  one  line  per  record. 

The  exact  format  of  the  header  records  and  the  file's  record 
and  block  sizes  will  be  defined  during  the  program  development. 


< START  PLACE  > 


PURPOSE  • Defines  starting  location  in  file. 

COMMENTS  • FROM  START:  start  at  beginning  of  file 

(line  1 word  1) 

FROM  LINE  <1 ine  # > WORD  < WORD  # > : 

start  at  word  < word  1*  > on  line  < line  P > 
<linc  P > >_  1,  < word  # > >_  1 . 

FROM  LINE  < line  # > : 

start  at  word  1 on  line  < line  # 


If  no  < start  place  ■ is  provided,  < FROM  START  > is  assumed. 
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< END  PLACE  > 


PURPOSE  • Define  ending  location  in  file. 

COMMENTS  • THRU  END:  to  end  of  file 

THRU  LINE:  < line  # > WORD  < word  # > : 

last  word  is  word  < word  # > on  line  < line  # > . 

THRU  LINE:  < line  # > : 

last  word  is  final  word  on  line  < line  # > . 

FOR  < # blocks  > BLOCKS  : < # blocks  > text  blocks 

will  be  processed. 

If  no  < end  place  > is  provided,  THRU  END  is  assumed. 

< DICTIONARY  SPEC  > : := 

-+  DICTIONARY  -*■  < FILE  ID  > W / 

PURPOSE  • Specifies  the  dictionary  to  be  used. 

COMMENTS  • If  no  < dictionary  spec  > is  provided,  the  standard  dictionary 
will  be  used.  The  format  of  dictionary  files  and  the  name  of 
the  default  dictionary  will  be  specified  during  the  development 
phase. 
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BLOCK  SPEC  > 


PUP-POSE  • Specifies  the  size  or  a text  block. 

COMMENTS  • BLOCK  SIZE  < # words  > WORDS  : blocks  will  be  formed 

at  least  < # words  > words  such  that  a block  contains  only 
complete  sentences.  100  <_  < # words  > 

BLOCK  SIZE  < # sentences  > SENTENCES  : blocks  will  be 
formed  of  < ft  sentences  > sentences. 

BLOCK  MARKS  : the  source  will  define  the  blocks  by  means  of 

embedded  < block  mark  > (depending  on  particular  system  £ 
text  editor  it  may  be  necessary  to  specify  the  < block  mark  > 
in  the  request  syntax). 

If  no  < block  spec  > is  provided,  BLOCK  SIZE.  500  WORDS  is  assumed. 


< LIST  LOCATION  > : : = 


PURPOSE  • Specifies  location  of  output  display  or  listings. 

COMMENT  1 • TERMINAL  is  the  originating  remote  station/terminal.  If  no 
< list  location  > is  specified,  LIST  TO  PRINTER  is  assumed. 
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< LIST  OPTION 

> : : - 

- TEXT  r* 

WITH 

-+  LINE  r 

NUMBERS 

PURPOSE  • Specific  optional  list  items. 

COMMENTS  • If  specified,  the  text  blocks  reported  on  will  be  listed 
If  "WITH  LINE  NUMBERS"  is  included,  the  line  numbers  wil 
be  printed  with  each  text  line. 

If  no  < list  option  > is  specified,  no  text  lines  will  be  printed. 


< MEASURE  OPTION  > 


COMPREHENSIBILITY 


MEASURI 


SUMMARY 


DETAIL 


COMMENTS  • If  no  < measure  option  - is  specified,  COMP  SUMMARY  is 
assumed. 


Cognition  of  Semantic  Units 
Cognition  of  Semantic  Relations 
Memory  of  Semantic  Units 
Evaluation  of  Symbolic  Implications 

nvergent  Production  of  Semantic  Implications 
Divergent  Production  of  Semantic  Units 
Yngve  depth 
Morpheme  depth 
Fran; : ormat ional  comp  iexity 
Center  embedding 
Let t branching 
Right  ; ranching 
Ieleting  complements 

There  are  five  types  rt  that  ma . : • pi  vid  1 : 

1.  entence  detail 

2.  Dictionary  check 

3.  Block  results 

4.  Run  summary 
R report 

nly  report  vi  led  on  i "CHECK"  run.  For  a "MEASURE"  run: 

Rej  rt  1 i i jvi  led  if  "DETAIL"  is  specified. 

Resort  • i.,  p t tivided  if  "SUMMARY"  is  specified. 

Report  4 is  always  provided. 

I • 5 is  provided  if  "RGL"  is  specified. 


CMl)  - 
CMR  - 
MMU  - 
ESI  - 
NMI  - 
DMU  - 
YD  - 
MD  - 
TC  - 
CL  - 
L.B  - 
RB  - 
DC 
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< MODE  > 


y BATCH  

* INTERACTIVE 


mj/ 


PURPOSE  • Activates  batch  processing. 

COMMENTS  • If  BATCH, the  run  can  not  respond  to  any  system  questions  or 

notifications.  If  interactive, the  user  can  respond  to  system 
questions  or  notifications  (such  as  word  not  in  dictionary). 

If  no  < mode  > is  specified,  BATCH  is  assumed. 


< SAMPLE  SPEC  > ::= 


+ EVERY 


< # WORDS  > 


L 


WORDS 


?7 


PURPOSE  • Specifies  that  the  dictionary  check  is  to  be  performed  only 
for  one  out  of  each  < # words  > words. 

COMMENTS  • < # words  > an  integer  greater  than  1. 

The  percentage  of  words  not  found  in  the  dictionary  will  !e 
reported. 

If  no  < SAMPLE  SPEC  > is  provided,  each  word  will  be  checke : , i.e., 

< # words  > will  be  set  to  1. 
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< ABORT  SPEC  > 


PURPOSE  • The  abort  analysis  if  current  dictionary  is  incomplete. 

COMMENTS  • If  specified,  the  entire  run  will  be  aborted  if  more  than 

< integer  > percent  of  the  words  are  not  in  the  dictionary. 

< integer  > between  0 end  100  inclusive.  The  abort  action 
will  be  considered  only  at  the  end  of  each  text  block.  If 
the  percentage  of  words  not  in  the  dictionary  at  the  end  of 

a text  block  exceeds  <c  integer  > , all  measures  and  star  i .tier 
will  be  calculated  and  reported  thru  the  aborting  block, 
in  luding  ill  summary  information  for  the  run.  No  additional 
text  blocks  will  be  processed. 

Ir  no  < ABORT  SPEC  > is  provided,  all  text  blocks  will  be  processed  regard- 
less of  the  number  ot  "uni  lent  li  w it  : . 


1 , : • Ep<  if i(  the  n imi  if  thi  exampl«  file. 

’ • Thi  Fil<  ill  "J  i ” v.  : : md  phrase  r<  . ' i • 1 

input i thi  I iv<  rgent  Production  of  Semantic  Uni'  (DM  1 

me3  vamp le  spec  i provided,  th<  • s. . r : 

examp  1 ■ . ■ tile-  will  1 . u„.  : . 

Thi  • : : : • i thi  exampli  f i < md  the  nami  ! the  iefault 
examp  li  • ' . ■ , if  ied  luring  th<  level  .ament  phase. 
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< NORM  SPEC 


PURPOSE  • Specifies  the  norm(s)  to  which  the  current  run  is  to  be 
compared. 

COMMENTS  • MANUAL  - manual 

CDC  - Career  Development  Course 

TO  - Technical  Order 

SG  - Study  Guide 

OVERALL  - Overall 

If  ho  < norm  spec  > is  provided,  OVERALL  is  assumed. 

< SUBJECT  READER  SPEC  > ::= 


PURPOSE  • Specifies  the  subjects  reading  skill  level(s). 

COMMENTS  • One  or  more  levels  may  be  selected.  The  subject  reader 
specification  value(s)  will  determine  which  composite 
index  formula(s)  will  be  used. 

If  no  < subject  reader  spec  > is  provided,  all  three  are  assumed. 
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< CLICHE  SPEC  > 


+ CLICHE  -*  < file  id  > — // 

PURPOSE  • Specifies  the  name  of  the  cliche  file. 

COMMENTS  • If  no  < cliche  spec  > is  provided,  the  standard  cliche  file 
will  be  used. 

The  format  of  the  cliche  file  and  the  name  of  the  default 
cliche  file  will  be  specified  during  the  development  phase. 


< PARSE  LIMIT  SPEC  > ::= 


PARSE 


< integer  > 


// 


PURPOSE 


Specifies  the  maximum  number  of  parses  that  will  be  permitted 
per  sentence.  < integer  > between  1 and  100  inclusive. 


COMMENTS 


If  no  < PARSE  LIMIT  SPEC  > is  provided,  no  limit  will  be  placed 
on  the  number  of  possible  parses  per'  sentence. 
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< COMMENT  > 


COMMENT  ->  " -*  < any  string  not  containing  a " > »-// 

PURPOSE  • Specifies  a comment  or  title  that  will  appear  at  the  top 

of  each  printed  report  page  for  this  run.  The  string  can  be 
any  series  of  printable  characters  not  containing  a quote 
(").  The  entire  string  and  its  surrounding  quotes  must  be 
on  the  same  input  record. 

EXAMPLE: 

COMMENT  "PRELIMINARY  REFERENCE  MANUAL  #2" 

COMMENT  • If  no  comment  is  provided  the  following  title  is  assumed: 

"Comprehensibility  Measures  Program" 
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APPENDIX  D 


Output  Formats 


Figure  D-2.  Format  of  output  for  check  of  words  in  dictionary. 


APPENDIX  E 


Names  of  Global  and  Other  Data  Items 


E-l  Global  Items 
E-2  Measure  Items 
Run  Level 
Block  Level 
Sentence  Level 
E-3  Measures  Sentence  Data 
Array  & Other  Structures 
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Each  item  in  Appendix  E is  shown  in  the  form: 


•name'  <type>  'initial  value'  <reference  list>  <description> 

•name'  is  the  name  of  the  item  as  used  in  the  program  module  descriptions. 
In  actual  implementation , name  changes  may  be  required  to  correspond 
to  language  requirements. 

< type  ■ is  one  of 

F file 

B boolean,  either  true  or  false 

D discreet,  can  have  one  of  a list  of  discreet  values.  Followed  by 
the  possible  values.  Note,  a discreet  item  is  never  true  or  false, 
and  can  never  be  used  in  an  arithmetic  calculation.  It  can  only  be 
set  to  one  of  its  possible  values  and  compared  to  one  of  its 
possible  values. 

R real 

A alpha  (string  of  characters) 

I pointer  or  index 

If  followed  by  ARRAY  then  item  is  an  array  of  items. 

<initial  value>  is  indicated  by  a value  in  square  brackets,  i.e.,  [0]. 

this  is  the  value  the  item  is  set  to  at  run  initialization.  If  no 
initial  value  is  specified,  its  initial  value  can  be  anything. 

Initial  values  are  set  in  module  1.  An  initial  value  of  [“]  implies 
the  largest  possible  integer  value. 

''reference  list>  Each  module  that  references  an  item  has  its  module 

number  in  this  list.  If  followed  by  an  asterisk  then  the  value 

of  the  item  may  be  changed  in  that  module.  The  module  "C"  denotes 
the  common  control  logic  outside  of  any  specific  module. 

<description>  is  the  purpose  of  the  item. 
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GLOBAL  ITEMS 


Files 

CLICHEFILE  F 2,7 

list  of  cliches 

DICTFILE  F 2,7 

dictionary 

EXAMP LEFILE  F 2,7 

list  of  example  introducer  words  and  phrases 

INPUT  FILE  F 1,4 


user  input  file  (card  or  remote) . 

REPORTBLOCK  F 2,7,18 

printer  file  for  block  summary. 

REPORTCHECK  F 2,17 

printer  file  for  check  run  summary. 

REPOKTRGL  F 2,15 

printer  file  for  RGL  summary. 


REPORTSENTENCE 


2,13 


printer  file  for  sentence  summary. 
REPORTSUMMARY  F 2,5,20 


printer  file  for  run  summary. 


TEXTFILE  F 2,3,7 

text  file. 

General  Variables 

ABORTPERCENT  R [lUOj  2*,  C 

percentage  of  words  not  in  dictionary  to  total  words  that  is  to  cause 
a run  to  abort.  See  <abort  spec>. 

ARI  R 14*, 15, 18 

Automated  Readability  Index 

ARI RGL  R 14*, 15, 18 

Automated  Readability  Index  R-ading  Grade  Level 
BLOCKCOUNT  R 2 * , 1 3 , C 

number  of  blocks  to  process.  See  < end  place  > . 
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BLOCKNUMBER  R [C]  3* 

number  of  block  being  processed. 

BLOCKSIZE  R [500]  2*, 17, 20, C 

size  of  each  block  (units  determined  by  BLOCKTYPE) . See  <block  spec>. 

BLOCKTYPE  D:  BLOCKINWORDS/BLOCKINSENTENCES/ BLOCKMARKS 

[BLOCKINWORDS]  2*, 17, 20, C 

indicates  the  units  of  block  length.  See  <block  spec>. 

COMMENT  A 2*  , 1 3 , 1 5 , 1 7 , 1 8 , 20 

holds  comment  string  for  page  header.  See  <comment  request--. 

COMPINDEXHIGH  j 
COMPINDEXHILOW J R 16*,  20 
COMP INDEXLOW  ) 

contains  composite  index  based  on  reader  skill. 


CURRENTLINE  R 3* 

current  text  line  number  in  progress. 

ENDLINE  R [°°]  2*,  17, 20, C 

last  line  to  be  processed.  See  <end  place>. 

ENDTYPE  D THRUEND/COUNTEND  [THRUEND]  2*,20,C 

type  of  <end  place>.  If  THRUEND  then  ENDLINE  and  ENDWORD  are 
valid.  If  COUNTEND  then  BLOCKCOUNT  is  valid.  See  <end  place>. 

ENDWORD  R [»]  2*,  17,20,C 

last  word  to  process  on  ENDLINE.  See  <end  place>. 

ERRORCOUNT  R [ 0]  5* 

number  of  syntax  errors  in  a run. 

FLESCHRL  R 14*, 15, 18 

Flesch  Reading  Grade  Level. 

FORCASTRGL  R 14*, 15, 18 

For cast  Reading  Grade  Level. 


FOUND  B 7*,  8 

true  if  current  word  is  in  dictionary. 


HAVEERROR’  B [false]  2,5* 

true  when  have  a syntax  error  on  a request. 


LISTLINE  B [false]  2*, 7 

true  if  line  numbers  are  to  be  included  in  text  list. 
See  <list  options'*. 
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LISTLOC 


D LI STTOP  RI NTE  R/L I STTOREMOTE  [LISTTOPRINTER] 


.2*, 7 

where  to  list  output  reports.  See  <list  location> . 

LISTTEXT  B [false]  2*,  7 

true  if  to  list  text  in  block  summary  report.  See  <list  options:-. 
MAXPARSE  R (<*>]  2*,  20 

maximum  number  of  parses  per  sentence  permitted.  See  <parse  limit  spec>. 

MEASURE  B array  [true]  2*,  10,11 

one  item  for  each  measure:  true  if  to  compute  that  measure.  See 

<measure  options> . 

MODE  D BATCH/INTERACTIVE  2* ,7 

indicates  whether  the  user  is  running  the  program  from  batch  or 
interactively.  See  <mode> . 

NDNSAVE  R 3*, 8*,  10 

save  value  of  NDNB  when  block  goes  over  100  words. 

NDWSAVE  R 3*, 8*, 10 

save  value  of  NDWB  when  block  goes  over  100  words. 

NORMCDC  (Career  Development  Course) 

NORMMAN  (manual) 

NORMOVERALL  (overall) 

NORMSG  (study  guide) 

NORMTO  (Technical  Order) 

true  if  norm  is  to  be  calculated  on  specified  basis.  See  <norm  spec>. 

NOTINDICT  R [0]  7* , 1 7 

total  number  of  words  not  in  dictionary. 

NUMREFS  R [0]  7* 

number  of  references  made  to  dictionary  in  this  run. 

PARSE-TABLE  array  9* , 1 1 

holds  parse  information  for  current  sentence. 

READERHIGH 

READERHILOW  B [true]  2*, 16,20 
READERLOW 

true  if  the  reading  skill  of  the  subject  is  to  be  high,  high/low,  or 
low.  See  <subject  reader  spec>. 
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[false] 

[false] 

[true]  1 2*, 20 

[false] 

[false] 


RUNTYPE  D CHECKRUN/MEASURERUN  2*  ,C 

indicate  type  of  run. 

SAMPLESIZE  R [11  2*,  17 

for  CHECK  run  only:  perform  dictionary  check  on  one  of  every 

SAMPLESIZE  words.  See  • sample  spec-. 

SCANINTOKEN  A 2 ,4* ,5, 7 

current  token  from  SCANINPUT. 

SANINTYPE  D TOKENISWORD/TOKENISNUMBER/TOKENISFILED/TOKENISSYMBOL/ 

TOKENISEOF  2,4*  ,7 

type  of  token  returned  from  SCANINPUT. 

SCANINVALUE  R 2, 4*,  7 

if  token  returned  from  SCANINPUT  i's  ?3umber  (SCANINTYPE=  TOKEN ISNUMBER)  , 
then  SCANINVALUE  is  the  integral  value  of  the  current  token. 

SCANPOINTER  I [81 J 4* 

pointer  to  current  column  in  SCANRECORD. 

SCANRECORD  A array  4* 

current  input  record 

SCANSAMETOKEN  B [false]  2*,  4*,  7* 

if  true,  SCANINPUT  will  look  at  the  same  token  as  the  last  call. 

Always  FALSE  after  a call  on  SCANINPUT. 

STARTLINE  R [1]  2*, 3, 17,20 

starting  text  line.  See  'start  place-. 

STARTWORD  R [1]  2*,  3, 17,20 

starting  text  word  on  STARTLINE.  See  -start  place  . 
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MEASURE  ITEMS 


Run  Level 

TNWR  R [0]  8* ,17 

total  nuitiber  of  words  processed 


Block  Level 

NDWB  R 3*  ,8*  , 1 0 , 1 8 

number  of  different  words  in  a text  block 

TNWB  R 3*.  8*,  10,  14,18 

total  number  of  words  in  a text  block 

NAMB  R 3*, 8*, 18 

number  of  multiple  word  abbreviations  in  a text  block 
NASB  R 3* ,8* , 18 


number 

NSNB 

number 

NORB 

number 

NDNB 

number 

NSWB 

number 

TNSB 

number 

TNMB 

number 

NPPB 


of  single  word  abbreviations  in  a text  block. 

R 3*, 8*, 10, 16*,  18 
of  shared  nouns  in  a text  block 

R 3* , 8* , 1 0 , 1 8 
of  references  in  a text  block 

R 3*, 8*, 10, 18 

of  different  nouns  in  a text  block 
R 3*, 8*, 10, 18 

of  abbreviated  or  symbolic  words  in  a text  block 

R 3*, 6*, 10, 11 *,13, 14, 18 
of  sentences  in  a text  block 

R 3*, 8*, 18 

of  morphemes  in  a text  block 
R 3* , 9* , 1 0 , 1 8 


number 

TNEB 

number 

NWNDB 

number 

TPSB 

number 

TCLB 

number 


of  potential  parses  in  a text  block 
R 3*, 8*, 10, 18 

of  elucidations  in  a text  block 
R 3*. 7*. 18 

of  words  not  in  dictionary  in  a text  block 
R 3*, 8*, 18 

of  parts  of  speech  of  all  words  in  a text  block 
R 3*, 8*, 18 

of  clauses  on  left  of  noun  in  a text  block 


— 
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i NCB 

OSWB 

TSCB 

NCRB 

NCLB 

Sentt 

NPPS 

NESS 

TNWS 

YDS 

TCS 

NNPS 

NCRS 

NCLS 

DCS 


R 3* , 8* , 1 4 , 1 8 

number  of  characters  in  a text  block 
R 3*, 8*. 14, 18 

number  of  one  syllable  words  in  a text  block 

. R 3*, 8*, 14, 18 

number  of  syllables  in  all  words  in  a text  block 
J R [0]  1 2*  , 1 6 

number  of  modifying  clauses  on  the  right/left  of  the  object  noun  for 
a block 


znoe  Level 

R 6*, 13 

number  of  potential  parses  in  a sentence 
R 6*  , 1 3 

number  of  explanations  in  a sentence 

R 6* ,7* , 1 1 , 1 3 
number  of  words  in  a sentence 

R 6* , 1 1 * , 1 3 

Yngve  depth  of  a sentence 

R 6*, 13 

Transformational  Complexity  of  a sentence 
R 6*, 10*, 12, 13, 16 

number  of  noun  phrases  to  the  right  of  the  subject  verb  in  a sentence 


} R 6*  , 1 3 

number  of  modifying  clauses  on  the  right/left  of  the  object  noun  phrase 
of  a sentence 

R 6*, 13 

Deleted  Complement  of  a sentence 
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Measures 


For  each  measure  there  are  three  items. 

1.  The  measure  itself  • (a  two  or  three-character  name),  e.g.:  CMU. 

2.  A normalized  value  of  the  measure  (measure  followed  by  "N" ), 
e.g. . CMUN . 

3.  A total  measure  used  to  compute  averages  (measure  followed  by  "T"), 
e.g.,  CMUT. 

All  items  are  of  type  R. 

The  measures  are  divided  into  two  categories: 

S true ture-of- intellect:  (referenced  in  1 0* , 1 6, 1 8, 20) 

CMU  - Cognition  of  Semantic  Units 

CMR  - Cognition  of  Semantic  Relations 

MMU  - Memory  of  Semantic  Units 

ESI  - Evaluation  of  Symbolic  Implications 

NMI  - Convergent  Product  of  Semantic  Implications 

DMU  - Divergent  Product  of  Semantic  Units 

Psycho linguistic : (referenced  in  1 1 * , 1 6, 18 , 20) 

YD  - Yngve  Depth 

MD  - Morpheme  Depth 

TC  - Transformational  Complexity 

SE  - Center  Embedding 

LB  - Left  Branching 

RB  - Right  Branching 

DC  - Deleted  Complement 

The  13  normalized  values  (CMUN,  CMRN , MMUN , ESIN,  NMIN , DMUN , YDN , MDN , TCN, 

SEN,  LBN,  RBN , DCN)  are  referenced  in  16*,  18,  20. 

The  13  total  values  (CMUT,  CMRT,  MMUT , ESIT,  NMIT,  DMUT , YDT , TCT , SET,  LBT , 

RNT,  DCT)  are  referenced  in  18*,  20. 
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E-3 


SEN  I ENCE  DATA  ARRAY  AND  OTHER  STRUCTURES 


Sentence  Data  Array 


(one  element  of  each  for  each  word 


in  a sentence) 


NOMORE 

R 

7*  j 8',  9 

NOPARTS 

R 

7*, 8, 9 

PARTI*] 

R 

array  7 

NOSYLLABLES 

R 7*, 

NEGIND 

B 

7*, 9 

SYMBOLIND 

B 

7*, 8/ 

NOWORDS 

R 

7*, 8, 9 

NOREFS 

R 

7*  ,8 

WORD 

A 

7* 

EXAMPLE 

B 

7* 

CLICHE 

B 

7* 

number  of  morphemes 
number  of  parts  of  speech 

parts  of  speech  (one  item  for  each  part  of  speech) 

number  of  syllables 

negative  indicator 

symbolic/abbreviation  indicator 

number  of  words 

number  of  references  to  word 

word 

example  indicator 
cliche  indicator 


Dictionary  File  Items  (items  have  same  meaning  as  corresponding  item  in 

SENTENCE  DATA  ARRAY) 

DNOMORE  R 7 
DNOPARTS  R 7 
DP ART  [*]  R array  7 
DNOSYLLABLES  R 7 
DNEGIND  B 7 
DSYMBOLIND  B 7 
DNOWORDS  R 7 
DNOREFS  R 7* 

DWORD  A 7 


header  information: 


DICTDATE 

R 

dictionary  creation  date 

TOTALPARTS 

R 

total 

number  of  parts  of  speech 

REFERENCES 

R 

19*  total 

number  of  references  to  dictionary 

■-V  I ABLE 

(one  entry  for  each  different  word  in  a block) 

BWOkb 

A 

8*  , 17 

word 

BNON 

B 

8*  , 17 

noun  indicator 

bnotfqund 

B 

8*  , 1 7 

time  if  word  not  in  dictionary 

BLINE 

R 

8*  , 17 

line  number  of  first  occurrence  if  BNOTFOUND 
is  true 

BCOUNT 

R 

8*  , 17 

number  of  times  word  used  if  not  in  dictionary 

BMULTY 

B 

8*  , 17 

true  if  used  more  than  once  in  a text  block 

2A2 


F 


DEFINITIONS/ABBREVIATIONS  FOR  SENTENCE  ELEMENT  CATEGORY 
(PARTS  OF  SPEECH  AND  PHRASE  STRUCTURE)  SYMBOLS 


S 

VP 

NP 

M 

N 

PRTP 

PREPP 

PP 

V 

' LV 
P 

CONJ 

NEG 

PRN 

RP 

MG 

ADV 

DET 

PRT 

EXP 

NABBR 

AABBR 

PN 

APOSP 

NPC 

VPC 

COM 

PART 

INFP 

INF 

AUX 

REC-L 

ADVCL 

ADVP 

CL 

C 

SA 

MP 

PURPCL 

VB 


sentence 
verb  phrase 
noun  phtase 
modifier 
noun 

participial  phrase 
prepositional  phrase 
past  participle 
verb 

linking  verb  (is) 

preposition 

conjuntion 

negative 

pronoun 

relative  pronoun 
modifier  group 
adverb 
determiner 
participle 

introducer  of  explanation 
noun  abbreviation 
adjective  abbreviation 
proper  noun 
apposite  phrase 
noun  phrase  complement 
verb  phrase  complement 
complementizer 
particle 

infinitive  phrase 
infinitive 
auxiliary 
relative  clause 
adverbial  clause 
adverbial  phrase 
clause 
conjoint 

sentence  taking  adverb 
modifier  phrase 
purpose  clause 
verbal 
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M 


■fl 


■ 


ADVCL 

ADVCL  -*  CONJ  S 

ADVCL  > ADV  S 

ADVCL  -►  NP  ADV  S 

ADV P 

AD  VP  -*■  CONJ  PARTP 

AUX 

AUX  -*■  AUX  NEG 

C 

C * CONJ  £ 

C ► "to"  S (the  word  "to") 

CONS 

CONJ  » M ADV 

INF?  and  INF 

INFP  -*  INFP  VP 

INF  -*•  "to"  LV 


N 

-y 

NP 

PRN 

NP 

-y 

NP  PP 

NP 

-> 

N MP 

NP 

-> 

DET  NP 

NP 

-> 

DET  N 

NP 

->■ 

M N 

NP 

-y 

NP  CONJ 

NP 

-y 

CONJ  NP 

NP 

N CONJ  N 

NP 

— y 

N NP 

NP 

— y 

M NP 

NP 

—y 

N CONJ  N 

NP 

-> 

N RELCC 

NP 

-y 

MP  N 

NP 

NPC 

CONJ  NP 

NPC 

-y 

PRTP 

COM  S 

PRTP 

-y 

PRTP  PP 

PRTP 

~y 

PRT  NP 

PP 


PP 

PP  CONJ  PP 

pp 

■* 

P NP 

pp 

-► 

PP  PP 

pp 

-► 

P N 

5 

s 

y 

ADVCL  S 

s 

-*■ 

NP  VP 

s 

-* 

PURPOSECL  S 

s 

-» 

S PURPOSF.CL 

s 

-+ 

S CONJ  S 

or  S 

•>  s CONJOINT 

CONJOINT  -»■  CONJ  S 

s 

y 

PP  S 

s 

> 

CONJP  S 

s 

S NPC 

s 

-f 

VP  VPC 

s 

CL  S 

s 

->• 

PP  VP 

s 

-+ 

C S 

s 

->• 

PRTP  S 

s 

> 

VP  PP 

s 

VP 

s 

-> 

S CONJ  S 

s 

-> 

VP  CL 

VP 

VP 

V NP 

VP 

-+ 

VB  NP 

VP 

-> 

V NP  PNP 

or  VP 

+ VP  PREPP 

NP 

-y 

or  NP 

* NP  PREPP 

PREPP  -►  P NP 

VP 

> V NP 

VP 

-*■ 

VP  VPC 

VP 

-> 

ADV  PRT 

VP 

ADV  VP 

VP 

-> 

VP  ADV 

VP 

-+ 

VB 

VP 

-> 

LV  M 

VP 

-> 

VB  ADV 

VP 

-> 

AUX  VP 

VP 

— ► 

VP  VB 

VP 

-> 

VB  NEG 

VP 

->■ 

LV  VB 

VP 

-*■ 

LV  PRT 

VP 

-> 

VP  PP 

VP 

-*■ 

VP  VPC 

VP 

■+ 

VP  CONJ  VP 

VP 

ADVP  VP 

VP 

-> 

VB  NP 
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APPENDIX  G 

Norms  for  Each  Measure 
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SC=  Study  Guidos 
MAN=  Manuals 

CDC=  Career  Development  Cour. 
TO=  Technical  Order " 

0/A=  Overall 


Convergent  Production  of  Semantic  Systems  6.  Convergent  Production  of  Semantic 

(NMS)  Implications  (NMI) 
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Morpheme  Depth  (MD)  10.  Transformational  Complexity  (TC) 
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